Detecting cancer cell of origin

ABSTRACT

Methods and compositions are provided for determining a pan-cancer clustering of cluster assignment (COCA) subtype of a cancer in an individual by detecting the expression level of at least one classifier biomarker selected from a group of classifier biomarkers for COCA subtypes. Also provided herein are methods and compositions for determining the response of an individual with a COCA subtype to a therapy such as immunotherapy.

CROSS REFERENCE TO RELATED APPLICATIONS

This application claims the benefit of priority to U.S. ProvisionalApplication No. 62/743,256 filed Oct. 9, 2018 and U.S. ProvisionalApplication No. 62/819,893 filed Mar. 18, 2019, each of which isincorporated by reference herein in its entirety for all purposes.

FIELD

The present invention relates to methods for determining an integrated,pan-cancer subtype and for predicting the prognosis of a patientinflicted with said integrated subtype of cancer.

STATEMENT REGARDING SEQUENCE LISTING

The Sequence Listing associated with this application is provided intext format in lieu of a paper copy, and is hereby incorporated byreference into the specification. The name of the text file containingthe Sequence Listing is GNCN_016_01WO_SeqList_ST25.txt. The text file is≈433 KB, was created on Oct. 8, 2019, and is being submittedelectronically via EFS-Web.

BACKGROUND

Cancers are typically classified using pathologic criteria that relyheavily on the tissue site of origin. Recently, large-scale genomicsprojects spearheaded by The Cancer Genome Atlas (TCGA) have beenundertaken in order to provide a detailed molecular characterization ofthousands of tumors, thereby making a systematic molecular-basedtaxonomy of cancer possible (see, for example,The_Cancer_Genome_Atlas_Network. Comprehensive genomic characterizationdefines human glioblastoma genes and core pathways. Nature. 2008;455:1061-1068; The_Cancer_Genome_Atlas_Network. Integrated genomicanalyses of ovarian carcinoma. Nature. 2011; 474:609-615;The_Cancer_Genome_Atlas_Network. Comprehensive genomic characterizationof squamous cell lung cancers. Nature. 2012a;489:519-525;The_Cancer_Genome_Atlas_Network. Comprehensive molecularcharacterization of human colon and rectal cancer. Nature. 2012b;487:330-337; The_Cancer_Genome_Atlas_Network. Comprehensive molecularportraits of human breast tumours. Nature. 2012c; 490:61-70;The_Cancer_Genome_Atlas_Network. Comprehensive molecularcharacterization of clear cell renal cell carcinoma. Nature. 2013a;499:43-49; The_Cancer_Genome_Atlas_Network. Genomic and epigenomiclandscapes of adult de novo acute myeloid leukemia. The New Englandjournal of medicine. 2013b; 368:2059-2074;The_Cancer_Genome_Atlas_Network. Comprehensive molecularcharacterization of urothelial bladder carcinoma. Nature. 2014;507:315-322; each of which is herein incorporated by reference). Theselarge-scale genomics projects have shown that each single-tissue cancertype can be further divided into three to four molecular subtypes andmeaningful differences in clinical behavior can often be correlated withthe single-tissue tumor types. In fact, in a few cases, single-tissuesubtype identification has led to therapies that target the drivingsubtype-specific molecular alteration(s). EGFR-mutant lungadenocarcinomas and ERBB2-amplified breast cancer are twowell-established examples.

Building off these projects, more recent studies have undertakenmulti-platform integrative analysis of thousands of cancers fromnumerous tumor types in The Cancer Genome Atlas (TCGA) project in orderto determine whether tissue-of-origin categories split into sub-typesbased upon multi-platform genomic analyses, what molecular alterationsare shared across cancers arising from different tissues and ifpreviously recognized disease subtypes in fact span multiple tissues oforigin (see Hoadley et al., Cell. 2014 Aug. 14; 158(4):929-944 andHoadley et al., Cell. 2018 Apr. 5; 173(2):291-304, each of which isherein incorporated by reference). While these studies have helped toelucidate a molecular taxonomy of cancer with newly defined integratedsubtypes that can provide a significant increase in the accuracy for theprediction of clinical outcomes, they have relied on performing asecond-level cluster analysis (i.e., clustering of cluster assignments(COCA)) using as input data from five ‘omic’ platforms. The ‘omic’platforms used in the studies for the COCA analysis included whole-exomeDNA sequence (Illumina HiSeq and GAII), DNA methylation (Illumina450,000-feature microarrays), genome-wide mRNA levels (IlluminamRNA-seq), microRNA levels (Illumina microRNA-seq), and protein levelsand/or phosphorylated proteins (Reverse Phase Protein Arrays; RPPA).

While the benefits of such a pan-cancer analysis from a clinicalstandpoint are clear, the resources necessary to perform said analysiscan be laborious, time-consuming and expensive. Accordingly, there isneed in the art for methods and resources for molecularly characterizingtumor samples in a rapid, efficient and reliable manner regardless oftissue of origin.

The present disclosure addresses the limitations of the current methodsand other needs in the field for an efficient method for pan-cancertumor classification that may inform prognosis and patient managementbased on underlying genomic and biologic tumor characteristics sharedacross tumor samples from multiple tissues of origin.

SUMMARY

The methods disclosed herein include determination of a cell of originsubtype, treatment of cancer based on a cell of origin subtype,prediction of overall survival of patients based on a cell of originsubtype, and application of an algorithm to gene expression data for oneor a plurality of classifier biomarkers for categorization of tumorsample into one of 21 a clustering of cluster assignments (COCA)subtypes C1 (ACC/PCPG), C2 (GBM/LGG), C3 (OV), C4 (Squamous-like), C6(LUAD-Enriched), C8 (PAAD/some STAD), C9 (UCS), C10 (BRCA/Basal), C12(UCEC), C14 (PRAD), C15 (CESC (subset of cervical)), C16 (BLCA), C17(TGCT), C19 (COAD/READ), C20 (SARC/MESO), C21 (KIRK/KICH/KIRP), C22(Liver), C24 (BRCA/Luminal), C25 (THYM), C26 (SKCM/UVM) and C28 (THCA))such that the COCA subtype is indicative of the cell of origin of thetumor sample regardless of the anatomical location of said tumor sample.The algorithm can be a classification to the nearest centroid (CLaNCalgorithm). The C1 COCA subtype can indicate that a tumor sample issubstantially similar to or is adenocortical carcinoma. The C2 COCAsubtype can indicate that a tumor sample is substantially similar to oris glioblastoma. The C3 COCA subtype can indicate that a tumor sample issubstantially similar to or is an ovarian serous cystadenocarcinoma(epithelial ovarian cancer). The C4 COCA subtype can indicate that atumor sample is substantially similar to or is squamous cell carcinomaof the lung, the head and neck or the bladder. The C6 COCA subtype canindicate that a tumor sample is substantially similar to or is lungadenocarcinoma. The C8 COCA subtype can indicate that a tumor sample issubstantially similar to or is pancreatic adenocarcinoma. The C9 COCAsubtype can indicate that a tumor sample is substantially similar to oris uterine carcinosarcoma. The C10 COCA subtype can indicate that atumor sample is substantially similar to or is the basal subtype ofbreast cancer. The C12 COCA subtype can indicate that a tumor sample issubstantially similar to or is uterine corpus endometrial cancer. TheC14 COCA subtype can indicate that a tumor sample is substantiallysimilar to or is prostate cancer. The C15 COCA subtype can indicate thata tumor sample is substantially similar to or is non-squamous cervicalcancer. The C16 COCA subtype can indicate that a tumor sample issubstantially similar to or is a bladder urothelial carcinoma. The C17COCA subtype can indicate that a tumor sample is substantially similarto or is a testicular germ cell tumor. The C19 COCA subtype can indicatethat a tumor sample is substantially similar to or is a colon, rectal,esophageal or stomach adenocarcinoma. The C20 COCA subtype can indicatethat a tumor sample is substantially similar to or is a sarcoma. The C21COCA subtype can indicate that a tumor sample is substantially similarto or is a kidney chromophobe, kidney renal papillary cell carcinoma orkidney renal clear cell carcinoma. The C22 COCA subtype can indicatethat a tumor sample is substantially similar to or is liverhepatocellular carcinoma. The C24 COCA subtype can indicate that a tumorsample is substantially similar to or is the luminal subtype of breastcancer. The C25 COCA subtype can indicate that a tumor sample issubstantially similar to or is thymoma. The C26 COCA subtype canindicate that a tumor sample is substantially similar to or is melanoma.The C28 COCA subtype can indicate that a tumor sample is substantiallysimilar to or is thyroid cancer.

In one aspect, provided herein is a method for determining a clusteringof cluster assignments (COCA) subtype of a tumor cancer sample obtainedfrom a patient, the method comprising detecting an expression level ofat least one classifier biomarker of Table 1, wherein the detection ofthe expression level of the classifier biomarker specifically identifiesa C1, C2, C3, C4, C6, C8, C9, C10, C12, C14, C15, C16, C17, C19, C20,C21, C22, C24, C25, C26 or C28 COCA subtype. In some cases, the methodfurther comprises comparing the detected levels of expression of the atleast one classifier biomarker of Table 1 to the expression of the atleast one classifier biomarker of Table 1 in at least one sampletraining set(s), wherein the at least one sample training set(s)comprises expression data of the at least one classifier biomarker ofTable 1 from a reference C1 sample, expression data of the at least oneclassifier biomarker of Table 1 from a reference C2 sample, expressiondata of the at least one classifier biomarker of Table 1 from areference C3 sample, expression data of the at least one classifierbiomarker of Table 1 from a reference C4 sample, expression data of theat least one classifier biomarker of Table 1 from a reference C6 sample,expression data of the at least one classifier biomarker of Table 1 froma reference C8 sample, expression data of the at least one classifierbiomarker of Table 1 from a reference C9 sample, expression data of theat least one classifier biomarker of Table 1 from a reference C10sample, expression data of the at least one classifier biomarker ofTable 1 from a reference C12 sample, expression data of the at least oneclassifier biomarker of Table 1 from a reference C14 sample, expressiondata of the at least one classifier biomarker of Table 1 from areference C15 sample, expression data of the at least one classifierbiomarker of Table 1 from a reference C16 sample, expression data of theat least one classifier biomarker of Table 1 from a reference C17sample, expression data of the at least one classifier biomarker ofTable 1 from a reference C19 sample, expression data of the at least oneclassifier biomarker of Table 1 from a reference C20 sample, expressiondata of the at least one classifier biomarker of Table 1 from areference C21 sample, expression data of the at least one classifierbiomarker of Table 1 from a reference C22 sample, expression data of theat least one classifier biomarker of Table 1 from a reference C24sample, expression data of the at least one classifier biomarker ofTable 1 from a reference C25 sample, expression data of the at least oneclassifier biomarker of Table 1 from a reference C26 sample, expressiondata of the at least one classifier biomarker of Table 1 from areference C28 sample or a combination thereof; and classifying thesample as the C1, C2, C3, C4, C6, C8, C9, C10, C12, C14, C15, C16, C17,C19, C20, C21, C22, C24, C25, C26 or C28 COCA subtype based on theresults of the comparing step. In some cases, the comparing stepcomprises applying a statistical algorithm which comprises determining acorrelation between the expression data obtained from the sample and theexpression data from the at least one training set(s); and classifyingthe sample as a C1, C2, C3, C4, C6, C8, C9, C10, C12, C14, C15, C16,C17, C19, C20, C21, C22, C24, C25, C26 or C28 COCA subtype based on theresults of the statistical algorithm. In some cases, the expressionlevel of the classifier biomarker is detected at the nucleic acid level.In some cases, the nucleic acid level is RNA or cDNA. In some cases, thedetecting an expression level comprises performing a quantitative realtime reverse transcriptase polymerase chain reaction (qRT-PCR), RNAseq,microarray analysis, gene chips, an nCounter Gene Expression Assay,Serial Analysis of Gene Expression (SAGE), Rapid Analysis of GeneExpression (RAGE), nuclease protection assays, Northern blotting, or anyother equivalent gene expression detection techniques. In some cases,the expression level is detected by performing RNAseq. In some cases,the detection of the expression level comprises using at least one pairof oligonucleotide primers specific for at least one classifierbiomarker of Table 1. In some cases, the sample is a formalin-fixed,paraffin-embedded (FFPE) tissue sample, a fresh or a frozen tissuesample, an exosome, wash fluids, cell pellets, or a bodily fluidobtained from the patient. In some cases, the bodily fluid is blood orfractions thereof (i.e., serum or plasma), urine, saliva, or sputum. Insome cases, the at least one classifier biomarker comprises a pluralityof classifier biomarkers. In some cases, the plurality of classifierbiomarkers comprises, consists essentially of or consists of at least 2classifier biomarkers, at least 4 classifier biomarkers, at least 6classifier biomarkers, at least 8 classifier biomarkers, at least 10classifier biomarkers, at least 12 classifier biomarkers, at least 14classifier biomarkers, at least 16 classifier biomarkers, at least 18classifier biomarkers, at least 20 classifier biomarkers, at least 30classifier biomarkers, at least 40 classifier biomarkers, at least 50classifier biomarkers, at least 60 classifier biomarkers, at least 70classifier biomarkers or at least 80 classifier biomarkers of Table 1.In some cases, the at least one classifier biomarker comprises, consistsessentially of or consists of all the classifier biomarkers of Table 1.

In another aspect, provided herein is a method of detecting a biomarkerin a tumor sample obtained from a patient, the method comprisingmeasuring the expression level of a plurality of classifier biomarkernucleic acids selected from Table 1 using an amplification,hybridization and/or sequencing assay. In some cases, the patient issuffering from or is suspected of suffering from kidney renal papillarycell carcinoma (KIRP); breast invasive carcinoma (BRCA); thyroid cancer(THCA); bladder urothelial carcinoma (BLCA); prostate adenocarcinoma(PRAD); kidney chromophobe (KICH); cervical squamous cell carcinoma andendocervical adenocarcinoma (CESC); kidney renal clear cell carcinoma(KIRC); liver hepatocellular carcinoma (LIHC); low grade glioma (LGG);sarcoma (SARC); lung adenocarcinoma (LUAD); colon adenocarcinoma (COAD);head and neck squamous cell carcinoma (HNSC); uterine corpus endometrialcarcinoma (UCEC); glioblastoma multiforme (GBM); esophageal carcinoma(ESCA); stomach adenocarcinoma (STAD); ovarian serous cystadenocarcinoma(OV); rectum adenocarcinoma (READ); adrenocortical carcinoma (ACC);uveal melanoma (UVM); mesothelioma (MESO); pheochromocytoma andparaganglioma (PCPG); skin cutaneous melanoma (SKCM); uterinecarcinsarcoma (UCS); lung squamous cell carcinoma (LUSC); testiculargerm cell tumors (TGCT); cholangiocarcinoma (CHOL); pancreaticadenocarcinoma (PAAD); thymoma (THYM); or Lymphoid Neoplasm DiffuseLarge B-cell Lymphoma (DLBC). In some cases, the amplification,hybridization and/or sequencing assay comprises performing quantitativereal time reverse transcriptase polymerase chain reaction(s) (qRT-PCR),RNAseq, microarray analysis, gene chips, nCounter Gene ExpressionAssay(s), Serial Analysis of Gene Expression (SAGE), Rapid Analysis ofGene Expression (RAGE), nuclease protection assays, Northern blotting,or any other equivalent gene expression detection techniques. In somecases, the expression level is detected by performing RNAseq. In somecases, the detection of the expression level comprises using at leastone pair of oligonucleotide primers per each of the plurality ofbiomarker nucleic acids selected from Table 1. In some cases, the sampleis a formalin-fixed, paraffin-embedded (FFPE) tissue sample, fresh or afrozen tissue sample, an exosome, wash fluids, cell pellets, or a bodilyfluid obtained from the patient. In some cases, the bodily fluid isblood or fractions thereof, urine, saliva, or sputum. In some cases, theplurality of classifier biomarkers comprises, consists essentially of orconsists of at least 2 classifier biomarkers, at least 5 classifierbiomarkers, at least 10 classifier biomarkers, at least 20 classifierbiomarkers, at least 30 classifier biomarkers, at least 40 classifierbiomarkers, at least 50 classifier biomarkers, at least 60 classifierbiomarkers, at least 70 classifier biomarkers or at least 80 classifierbiomarkers of Table 1. In some cases, the plurality of biomarker nucleicacids comprises, consists essentially of or consists of all theclassifier biomarker nucleic acids of Table 1.

In yet another aspect, provided herein is a method of treating cancer ina subject, the method comprising: measuring the expression level of atleast one biomarker nucleic acid in a tumor sample obtained from thesubject, wherein the at least one biomarker nucleic acid is selectedfrom a set of biomarkers listed in Table 1, wherein the presence,absence and/or level of the at least one biomarker indicates a COCAsubtype of the cancer; and administering a therapeutic agent based onthe COCA subtype of the cancer. In some cases, the at least onebiomarker nucleic acid selected from the set of biomarkers comprises,consists essentially of or consists of at least 2 classifier biomarkers,at least 5 classifier biomarkers, at least 10 classifier biomarkers, atleast 20 classifier biomarkers, at least 30 classifier biomarkers, atleast 40 classifier biomarkers, at least 50 classifier biomarkers, atleast 60 classifier biomarkers, at least 70 classifier biomarkers or atleast 80 classifier biomarkers of Table 1. In some cases, the methodfurther comprises measuring the expression of at least one biomarkerfrom an additional set of biomarkers. In some cases, the additional setof biomarkers comprises at least an immune cell signature, a cellproliferation signature, or drug target genes. In some cases, themeasuring the expression level is conducted using an amplification,hybridization and/or sequencing assay. In some cases, the amplification,hybridization and/or sequencing assay comprises performing quantitativereal time reverse transcriptase polymerase chain reaction(s) (qRT-PCR),RNAseq, microarray analysis, gene chips, nCounter Gene ExpressionAssay(s), Serial Analysis of Gene Expression (SAGE), Rapid Analysis ofGene Expression (RAGE), nuclease protection assays, Northern blotting,or any other equivalent gene expression detection techniques. In somecases, the expression level is detected by performing RNAseq. In somecases, the sample is a formalin-fixed, paraffin-embedded (FFPE) tissuesample, fresh or a frozen tissue sample, an exosome, wash fluids, cellpellets, or a bodily fluid obtained from the patient. In some cases, thebodily fluid is blood or fractions thereof, urine, saliva, or sputum. Insome cases, the subject's COCA subtype is selected from C1, C2, C3, C4,C6, C8, C9, C10, C12, C14, C15, C16, C17, C19, C20, C21, C22, C24, C25,C26 or C28.

In still another aspect, provided herein is a method of predictingoverall survival in a cancer patient, the method comprising detecting anexpression level of at least one classifier biomarker of Table 1 in atumor sample obtained from a patient, wherein the detection of theexpression level of the at least one classifier biomarker specificallyidentifies a COCA subtype, and wherein identification of the COCAsubtype is predictive of the overall survival in the patient. In somecases, the method further comprises comparing the detected levels ofexpression of the at least one classifier biomarker of Table 1 to theexpression of the at least one classifier biomarker of Table 1 in atleast one sample training set(s), wherein the at least one sampletraining set(s) comprises expression data of the at least one classifierbiomarker of Table 1 from a reference C1 sample, expression data of theat least one classifier biomarker of Table 1 from a reference C2 sample,expression data of the at least one classifier biomarker of Table 1 froma reference C3 sample, expression data of the at least one classifierbiomarker of Table 1 from a reference C4 sample, expression data of theat least one classifier biomarker of Table 1 from a reference C6 sample,expression data of the at least one classifier biomarker of Table 1 froma reference C8 sample, expression data of the at least one classifierbiomarker of Table 1 from a reference C9 sample, expression data of theat least one classifier biomarker of Table 1 from a reference C10sample, expression data of the at least one classifier biomarker ofTable 1 from a reference C12 sample, expression data of the at least oneclassifier biomarker of Table 1 from a reference C14 sample, expressiondata of the at least one classifier biomarker of Table 1 from areference C15 sample, expression data of the at least one classifierbiomarker of Table 1 from a reference C16 sample, expression data of theat least one classifier biomarker of Table 1 from a reference C17sample, expression data of the at least one classifier biomarker ofTable 1 from a reference C19 sample, expression data of the at least oneclassifier biomarker of Table 1 from a reference C20 sample, expressiondata of the at least one classifier biomarker of Table 1 from areference C21 sample, expression data of the at least one classifierbiomarker of Table 1 from a reference C22 sample, expression data of theat least one classifier biomarker of Table 1 from a reference C24sample, expression data of the at least one classifier biomarker ofTable 1 from a reference C25 sample, expression data of the at least oneclassifier biomarker of Table 1 from a reference C26 sample, expressiondata of the at least one classifier biomarker of Table 1 from areference C28 sample or a combination thereof; and classifying thesample as the C1, C2, C3, C4, C6, C8, C9, C10, C12, C14, C15, C16, C17,C19, C20, C21, C22, C24, C25, C26 or C28 COCA subtype based on theresults of the comparing step. In some cases, the comparing stepcomprises applying a statistical algorithm which comprises determining acorrelation between the expression data obtained from the sample and theexpression data from the at least one training set(s); and classifyingthe sample as a C1, C2, C3, C4, C6, C8, C9, C10, C12, C14, C15, C16,C17, C19, C20, C21, C22, C24, C25, C26 or C28 COCA subtype based on theresults of the statistical algorithm. In some cases, the expressionlevel of the classifier biomarker is detected at the nucleic acid level.In some cases, the nucleic acid level is RNA or cDNA. In some cases, thedetecting an expression level comprises performing quantitative realtime reverse transcriptase polymerase chain reaction(s) (qRT-PCR),RNAseq, microarray analysis, gene chips, nCounter Gene Expression Assay,Serial Analysis of Gene Expression (SAGE), Rapid Analysis of GeneExpression (RAGE), nuclease protection assays, Northern blotting, or anyother equivalent gene expression detection techniques. In some cases,the expression level is detected by performing RNAseq. In some cases,the detection of the expression level comprises using at least one pairof oligonucleotide primers specific for at least one classifierbiomarker of Table 1. In some cases, the sample is a formalin-fixed,paraffin-embedded (FFPE) tissue sample, fresh or a frozen tissue sample,an exosome, wash fluids, cell pellets, or a bodily fluid obtained fromthe patient. In some cases, the bodily fluid is blood or fractionsthereof, urine, saliva, or sputum. In some cases, the at least oneclassifier biomarker comprises a plurality of classifier biomarkers. Insome cases, the plurality of classifier biomarkers comprises, consistsessentially of or consists of at least 2 classifier biomarkers, at least5 classifier biomarkers, at least 10 classifier biomarkers, at least 20classifier biomarkers, at least 30 classifier biomarkers, at least 40classifier biomarkers, at least 50 classifier biomarkers, at least 60classifier biomarkers, at least 70 classifier biomarkers or at least 80classifier biomarkers of Table 1. In some cases, the at least oneclassifier biomarker comprises, consists essentially of or consists ofall the classifier biomarkers of Table 1.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 shows a cross-tabulation of the TCGA tumor type and COCA subtypefrom Hoadley et al., Cell. 2018 Apr. 5; 173(2):291-304 for samples withqualifying expression data as described in Example 1. FIG. 1 alsoprovides the integrated tumor subtypes provided herein.

FIG. 2 illustrates how the TCGA samples were divided into a training set(⅔ of the data set; n=5696) and test set (⅓ of the data set), balancingfor uniform tumor type of origin distributions for development of the84-gene subtyper described herein (see the Table in FIG. 2). Asillustrated in the graph on FIG. 2, using the training set, genes withlow variance and/or low mean were filtered out, while genes with meanvariance and mean expression values greater than 4 were kept resultingin gene expression data for 2190 genes.

FIG. 3 illustrates five-fold cross validation curves usingclassification to the nearest centroid (ClaNC) on the TCGA-2018 trainingdataset (n=408) to guide the selection of the number of genes persubtype to include in the signature for COCA subtyping provided herein.

FIG. 4 illustrates agreement and disagreement between the GS subtype(rows) and the subtype based on the 84-gene subtyper (columns) (leftpanel) for the test set described in Example 1. The right panel showsagreement for each COCA subtype listed. Overall agreement was 90%.Overall agreement with COCA on the training set was 91%.

FIG. 5 shows the proportion of COCA subtypes in the test set that werecalled correctly by the 84-gene typer developed in Example 1.

FIG. 6 shows results of within cancer-type survival analysis for bladdercancer (BLCA) via testing for association of COCA subtypes from BLCAsample with overall survival. p=0.0204 for COCA subtype C4 as determinedusing the 84 gene COCA subtyper provided herein.

FIG. 7 shows results of within cancer-type survival analysis for breastcancer (BRCA) via testing for association of COCA subtypes from BRCAsample with overall survival. p=0.00013 for COCA subtype C24 asdetermined using the 84 gene COCA subtyper provided herein.

FIG. 8 shows results of within cancer-type survival analysis for stomachadenocarcinoma (STAD) via testing for association of COCA subtypes fromSTAD sample with overall survival. p=0.00689 for COCA subtype C8 asdetermined using the 84 gene COCA subtyper provided herein.

DETAILED DESCRIPTION Definitions

While the following terms are believed to be well understood by one ofordinary skill in the art, the following definitions are set forth tofacilitate explanation of the presently disclosed subject matter.

As used herein, the singular forms “a”, “an” and “the” are intended toinclude the plural forms as well, unless the context clearly indicatesotherwise. Additionally, the use of “or” is intended to include “and/or”unless the context clearly indicates otherwise. Furthermore, to theextent that the terms “including”, “includes”, “having”, “has”, “with”,or variants thereof are used in either the detailed description and/orthe claims, such terms are intended to be inclusive in a manner similarto the term “comprising”. The term “about” as used herein can refer to arange that is 15%, 10%, 8%, 6%, 4%, or 2% plus or minus from a statednumerical value.

Unless the context requires otherwise, throughout the presentspecification and claims, the word “comprise” and variations thereof,such as, “comprises” and “comprising” are to be construed in an open,inclusive sense that is as “including, but not limited to”. The use ofthe alternative (e.g., “or”) should be understood to mean either one,both, or any combination thereof of the alternatives. As used herein,the terms “about” and “consisting essentially of” mean+/−20% of theindicated range, value, or structure, unless otherwise indicated.

Reference throughout this specification to “one embodiment” or “anembodiment” means that a particular feature, structure or characteristicdescribed in connection with the embodiment may be included in at leastone embodiment of the present disclosure. Thus, the appearances of thephrases “in one embodiment” or “in an embodiment” in various placesthroughout this specification may not necessarily all be referring tothe same embodiment. It is appreciated that certain features of thedisclosure, which are, for clarity, described in the context of separateembodiments, may also be provided in combination in a single embodiment.Conversely, various features of the disclosure, which are, for brevity,described in the context of a single embodiment, may also be providedseparately or in any suitable sub-combination.

Throughout this disclosure, various aspects of the methods andcompositions provided herein can be presented in a range format. Itshould be understood that the description in range format is merely forconvenience and brevity and should not be construed as an inflexiblelimitation on the scope of the invention. Accordingly, the descriptionof a range should be considered to have specifically disclosed all thepossible subranges as well as individual numerical values within thatrange. For example, description of a range such as from 1 to 6 should beconsidered to have specifically disclosed subranges such as from 1 to 3,from 1 to 4, from 1 to 5, from 2 to 4, from 2 to 6, from 3 to 6 etc., aswell as individual numbers within that range, for example, 1, 2, 3, 4,5, and 6. This applies regardless of the breadth of the range.

Unless otherwise indicated, the methods and compositions provided hereincan utilize conventional techniques and descriptions of organicchemistry, polymer technology, molecular biology (including recombinanttechniques), cell biology, biochemistry, and immunology, which arewithin the skill of the art. Such conventional techniques includepolymer array synthesis, hybridization, ligation, and detection ofhybridization using a label. Specific illustrations of suitabletechniques can be had by reference to the example herein below. However,other equivalent conventional procedures can, of course, also be used.Such conventional techniques and descriptions can be found in standardlaboratory manuals such as Genome Analysis: A Laboratory Manual Series(Vols. I-IV), Using Antibodies: A Laboratory Manual, Cells: A LaboratoryManual, PCR Primer: A Laboratory Manual, and Molecular Cloning: ALaboratory Manual (all from Cold Spring Harbor Laboratory Press), Gait,“Oligonucleotide Synthesis: A Practical Approach” 1984, IRL Press,London, Nelson and Cox (2000), Lehninger et al., (2008) Principles ofBiochemistry 5th Ed., W.H. Freeman Pub., New York, N.Y. and Berg et al.(2006) Biochemistry, 6.sup.th Ed., W.H. Freeman Pub., New York, N.Y.,all of which are herein incorporated in their entirety by reference forall purposes.

Conventional software and systems may also be used in the methods andcompositions provided herein. Computer software products for use hereintypically include computer readable medium having computer-executableinstructions for performing the logic steps of any of the methodsprovided herein. Suitable computer readable medium include floppy disk,CD-ROM/DVD/DVD-ROM, hard-disk drive, flash memory, ROM/RAM, magnetictapes, etc. The computer-executable instructions may be written in asuitable computer language or combination of several languages. Basiccomputational biology methods are described in, for example, Setubal andMeidanis et al., Introduction to Computational Biology Methods (PWSPublishing Company, Boston, 1997); Salzberg, Searles, Kasif, (Ed.),Computational Methods in Molecular Biology, (Elsevier, Amsterdam, 1998);Rashidi and Buehler, Bioinformatics Basics: Application in BiologicalScience and Medicine (CRC Press, London, 2000) and Ouelette and BzevanisBioinformatics: A Practical Guide for Analysis of Gene and Proteins(Wiley & Sons, Inc., 2.sup.nd ed., 2001). See U.S. Pat. No. 6,420,108.

The methods and compositions provided herein may also make use ofvarious computer program products and software for a variety ofpurposes, such as probe design, management of data, analysis, andinstrument operation. See, U.S. Pat. Nos. 5,593,839, 5,795,716,5,733,729, 5,974,164, 6,066,454, 6,090,555, 6,185,561, 6,188,783,6,223,127, 6,229,911 and 6,308,170. Computer methods related togenotyping using high density microarray analysis may also be used inthe present methods, see, for example, US Patent Pub. Nos. 20050250151,20050244883, 20050108197, 20050079536 and 20050042654.

Additionally, the present disclosure may have preferred embodiments thatinclude methods for providing genetic information over networks such asthe Internet as shown in U.S. Patent Pub. Nos. 20030097222, 20020183936,20030100995, 20030120432, 20040002818, 20040126840, and 20040049354.

As used herein, the terms “individual,” “patient,” and “subject” canrefer to any single animal, more preferably a mammal (including suchnon-human animals as, for example, dogs, cats, horses, rabbits, zooanimals, cows, pigs, sheep, and non-human primates) for which treatmentis desired. In particular embodiments, the individual or patient hereinis a human.

It will be appreciated that the term “healthy” as used herein, isrelative to cancer status, as the term “healthy” cannot be defined tocorrespond to any absolute evaluation or status. Thus, an individualdefined as healthy with reference to any specified disease or diseasecriterion, can in fact be diagnosed with any other one or more diseases,or exhibit any other one or more disease criterion, including one ormore other cancers.

The term “tumor,” as used herein, can refer to all neoplastic cellgrowth and proliferation, whether malignant or benign, and allpre-cancerous and cancerous cells and tissues. The terms “cancer,”“cancerous,” and “tumor” are not mutually exclusive and can be usedinterchangeably.

The term “detection” can include any means of detecting, includingdirect and indirect detection.

The terms “substantially” or “substantial” as used herein can meansubstantially similar in function or capability or otherwise competitiveto the products, items (e.g., type of cancer, nucleic acid complement),services or methods recited herein. Substantially similar products,items (e.g., type of cancer, nucleic acid complement), services ormethods are at least 80%, 81%, 82%, 83%, 83%, 84%, 85%, 86%, 87%, 88%,89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 99.5% similaror the same as a product, item (e.g., type of cancer, nucleic acidcomplement), service or method recited herein.

Overview

Provided herein are kits, compositions and methods for identifying,determining, detecting or diagnosing integrated, pan-cancer clusteringof cluster assignment (COCA) subtypes. That is, the methods can beuseful for molecularly defining subsets of cancer regardless of tissueof origin. The methods provide a pan-cancer classification of a tumorsample obtained from subject that can be prognostic and predictive fortherapeutic response. The therapeutic response can include chemotherapy,immunotherapy, angiogenesis inhibitor therapy, surgical interventionand/or radiotherapy. The methods can be also provide a prognosis ofoverall survival for cancer patients according to their pan-cancer,integrated COCA subtype. The kits, compositions and methods providedherein can be used to classify a tumor sample as being any type of COCAsubtype known in the art. In one embodiment, the COCA subtype determinedor diagnosed by the methods and compositions provided herein areselected from C1 (ACC/PCPG), C2 (GBM/LGG), C3 (OV), C4 (Squamous-like),C6 (LUAD-Enriched), C8 (PAAD/some STAD), C9 (UCS), C10 (BRCA/Basal), C12(UCEC), C14 (PRAD), C15 (CESC (subset of cervical)), C16 (BLCA), C17(TGCT), C19 (COAD/READ), C20 (SARC/MESO), C21 (KIRK/KICH/KIRP), C22(Liver), C24 (BRCA/Luminal), C25 (THYM), C26 (SKCM/UVM) and C28 (THCA).

The COCA subtype determined using the kits, compositions or methodsprovided herein can indicate or disclose the cell or tissue of origin ofa tumor sample obtained from a subject. For example, the C1 COCA subtypecan indicate that a tumor sample is substantially similar to or isadenocortical carcinoma; the C2 COCA subtype can indicate that a tumorsample is substantially similar to or is glioblastoma; the C3 COCAsubtype can indicate that a tumor sample is substantially similar to oris an ovarian serous cystadenocarcinoma (epithelial ovarian cancer); theC4 COCA subtype can indicate that a tumor sample is substantiallysimilar to or is squamous cell carcinoma of the lung, the head and neckor the bladder; the C6 COCA subtype can indicate that a tumor sample issubstantially similar to or is lung adenocarcinoma; the C8 COCA subtypecan indicate that a tumor sample is substantially similar to or ispancreatic adenocarcinoma; the C9 COCA subtype can indicate that a tumorsample is substantially similar to or is uterine carcinosarcoma; the C10COCA subtype can indicate that a tumor sample is substantially similarto or is the basal subtype of breast cancer; the C12 COCA subtype canindicate that a tumor sample is substantially similar to or is uterinecorpus endometrial cancer; the C14 COCA subtype can indicate that atumor sample is substantially similar to or is prostate cancer; the C15COCA subtype can indicate that a tumor sample is substantially similarto or is non-squamous cervical cancer; the C16 COCA subtype can indicatethat a tumor sample is substantially similar to or is a bladderurothelial carcinoma; the C17 COCA subtype can indicate that a tumorsample is substantially similar to or is a testicular germ cell tumor;the C19 COCA subtype can indicate that a tumor sample is substantiallysimilar to or is a colon, rectal, esophageal or stomach adenocarcinoma;the C20 COCA subtype can indicate that a tumor sample is substantiallysimilar to or is a sarcoma; the C21 COCA subtype can indicate that atumor sample is substantially similar to or is a kidney chromophobe,kidney renal papillary cell carcinoma or kidney renal clear cellcarcinoma; the C22 COCA subtype can indicate that a tumor sample issubstantially similar to or is liver hepatocellular carcinoma; the C24COCA subtype can indicate that a tumor sample is substantially similarto or is the luminal subtype of breast cancer; the C25 COCA subtype canindicate that a tumor sample is substantially similar to or is thymoma;the C26 COCA subtype can indicate that a tumor sample is substantiallysimilar to or is melanoma; or the C28 COCA subtype can indicate that atumor sample is substantially similar to or is thyroid cancer.

“Determining a COCA subtype” can include, for example, diagnosing ordetecting the presence, sub-type and cell-of-origin of a cancer,monitoring the progression of the disease, and identifying or detectingcells or samples that are indicative of said pan-cancer subtypes.

In one embodiment, the COCA subtype is assessed or determined throughthe evaluation of expression patterns, or profiles, of one or aplurality of classifier biomarkers or biomarkers in one or more subjectsamples. The term subject, or subject sample, may refer to an individualregardless of health and/or disease status. A subject can be a subject,a study participant, a test subject, a control subject, a screeningsubject, or any other class of individual from whom a sample is obtainedand assessed in the context of the methods and compositions providedherein. Accordingly, a subject can be previously diagnosed with one typeof a myriad of cancers, can present with one or more symptoms of saidtype of cancer, or a predisposing factor, such as a family (genetic) ormedical history (medical) factor for said type of cancer, can beundergoing treatment or therapy for said cancer, or the like.Alternatively, a subject can be healthy as de fin e d herein withrespect to any of the aforementioned factors or criteria.

The myriad of cancers from which a subject may be suffering from orsuspected of suffering from can be any cancer known in the art. Theclassifier biomarkers provided herein (e.g., the classifier biomarkersof Table 1) and methods of using said classifier biomarkers can be usedto determine an integrated, pan-cancer COCA subtype of the cancer thatsaid subject may be or is suspected of suffering from. Further to any ofthe embodiments provided herein, the cancer can include, but is notlimited to, carcinoma, lymphoma, blastoma (including medulloblastoma andretinoblastoma), sarcoma (including liposarcoma and synovial cellsarcoma), neuroendocrine tumors (including carcinoid tumors, gastrinoma,and islet cell cancer), mesothelioma, schwannoma (including acousticneuroma), meningioma, adenocarcinoma, melanoma, and leukemia or lymphoidmalignancies. Examples of a cancer can also include, but are not limitedto, a lung cancer (e.g., a non-small cell lung cancer (NSCLC) or smallcell lung cancer), a kidney cancer (e.g., a kidney urothelial carcinomaor RCC), a bladder cancer (e.g., a bladder urothelial (transitionalcell) carcinoma (e.g., locally advanced or metastatic urothelial cancer,including 1L or 2L+locally advanced or metastatic urothelialcarcinoma)), a breast cancer, a colorectal cancer (e.g., a colonadenocarcinoma), an ovarian cancer, a pancreatic cancer, a gastriccarcinoma, an esophageal cancer, a mesothelioma, a melanoma (e.g., askin melanoma), a head and neck cancer (e.g., a head and neck squamouscell carcinoma (HNSCC)), a thyroid cancer, a sarcoma (e.g., asoft-tissue sarcoma, a fibrosarcoma, a myxosarcoma, a liposarcoma, anosteogenic sarcoma, an osteosarcoma, a chondrosarcoma, an angiosarcoma,an endotheliosarcoma, a lymphangiosarcoma, alymphangioendotheliosarcoma, a leiomyosarcoma, or a rhabdomyosarcoma), aprostate cancer, a glioblastoma, a cervical cancer, a thymic carcinoma,a leukemia (e.g., an acute lymphocytic leukemia (ALL), an acutemyelocytic leukemia (AML), a chronic myelocytic leukemia (CML), achronic eosinophilic leukemia, or a chronic lymphocytic leukemia (CLL)),a lymphoma (e.g., a Hodgkin lymphoma or a non-Hodgkin lymphoma (NHL)), amyeloma (e.g., a multiple myeloma (MM)), a mycosis fungoides, a Merkelcell cancer, a hematologic malignancy, a cancer of hematologicaltissues, a B cell cancer, a bronchus cancer, a stomach cancer, a brainor central nervous system cancer, a peripheral nervous system cancer, auterine or endometrial cancer, a cancer of the oral cavity or pharynx, aliver cancer, a testicular cancer, a biliary tract cancer, a small bowelor appendix cancer, a salivary gland cancer, an adrenal gland cancer, anadenocarcinoma, an inflammatory myofibroblastic tumor, agastrointestinal stromal tumor (GIST), a colon cancer, a myelodysplasticsyndrome (MDS), a myeloproliferative disorder (MPD), a polycythemiaVera, a chordoma, a synovioma, a Ewing's tumor, a squamous cellcarcinoma, a basal cell carcinoma, a sweat gland carcinoma, a sebaceousgland carcinoma, a papillary carcinoma, a papillary adenocarcinoma, amedullary carcinoma, a bronchogenic carcinoma, a renal cell carcinoma, ahepatoma, a bile duct carcinoma, a choriocarcinoma, a seminoma, anembryonal carcinoma, a Wilms' tumor, a bladder carcinoma, an epithelialcarcinoma, a glioma, an astrocytoma, a medulloblastoma, acraniopharyngioma, an ependymoma, a pinealoma, a hemangioblastoma, anacoustic neuroma, an oligodendroglioma, a meningioma, a neuroblastoma, aretinoblastoma, a follicular lymphoma, a diffuse large B-cell lymphoma,a mantle cell lymphoma, a hepatocellular carcinoma, a thyroid cancer, asmall cell cancer, an essential thrombocythemia, an agnogenic myeloidmetaplasia, a hypereosinophilic syndrome, a systemic mastocytosis, afamiliar hypereosinophilia, a neuroendocrine cancer, or a carcinoidtumor.

In one embodiment, the cancer is selected from kidney renal papillarycell carcinoma (KIRP); breast invasive carcinoma (BRCA); thyroid cancer(THCA); bladder urothelial carcinoma (BLCA); prostate adenocarcinoma(PRAD); kidney chromophobe (KICH); cervical squamous cell carcinoma andendocervical adenocarcinoma (CESC); kidney renal clear cell carcinoma(KIRC); liver hepatocellular carcinoma (LIHC); low grade glioma (LGG);sarcoma (SARC); lung adenocarcinoma (LUAD); colon adenocarcinoma (COAD);head and neck squamous cell carcinoma (HNSC); uterine corpus endometrialcarcinoma (UCEC); glioblastoma multiforme (GBM); esophageal carcinoma(ESCA); stomach adenocarcinoma (STAD); ovarian serous cystadenocarcinoma(OV); rectum adenocarcinoma (READ); adrenocortical carcinoma (ACC);uveal melanoma (UVM); mesothelioma (MESO); pheochromocytoma andparaganglioma (PCPG); skin cutaneous melanoma (SKCM); uterinecarcinsarcoma (UCS); lung squamous cell carcinoma (LUSC); testiculargerm cell tumors (TGCT); cholangiocarcinoma (CHOL); pancreaticadenocarcinoma (PAAD); thymoma (THYM); Lymphoid Neoplasm Diffuse LargeB-cell Lymphoma (DLBC); and Acute Myeloid Leukemia [LAML] motherembodiment, the cancer is selected from kidney renal papillary cellcarcinoma (KIRP); breast invasive carcinoma (BRCA); thyroid cancer(THCA); bladder urothelial carcinoma (BLCA); prostate adenocarcinoma(PRAD); kidney chromophobe (KICH); cervical squamous cell carcinoma andendocervical adenocarcinoma (CESC); kidney renal clear cell carcinoma(KIRC); liver hepatocellular carcinoma (LIHC); low grade glioma (LGG);sarcoma (SARC); lung adenocarcinoma (LUAD); colon adenocarcinoma (COAD);head and neck squamous cell carcinoma (HNSC); uterine corpus endometrialcarcinoma (UCEC); glioblastoma multiforme (GBM); esophageal carcinoma(ESCA); stomach adenocarcinoma (STAD); ovarian serous cystadenocarcinoma(OV); rectum adenocarcinoma (READ); adrenocortical carcinoma (ACC);uveal melanoma (UVM); mesothelioma (MESO); pheochromocytoma andparaganglioma (PCPG); skin cutaneous melanoma (SKCM); uterinecarcinsarcoma (UCS); lung squamous cell carcinoma (LUSC); testiculargerm cell tumors (TGCT); cholangiocarcinoma (CHOL); pancreaticadenocarcinoma (PAAD); thymoma (THYM); and Lymphoid Neoplasm DiffuseLarge B-cell Lymphoma (DLBC).

As used herein, an “expression profile” or an “expression pattern” or a“biomarker profile” or a “gene signature” can comprise one or morevalues corresponding to a measurement of the relative abundance, level,presence, or absence of expression of a discriminative or classifierbiomarker or biomarker. An expression profile can be derived from asubject prior to or subsequent to a diagnosis of a type of cancer, canbe derived from a biological sample collected from a subject at one ormore time points prior to or following treatment or therapy, can bederived from a biological sample collected from a subject at one or moretime points during which there is no treatment or therapy (e.g., tomonitor progression of disease or to assess development of disease in asubject diagnosed with or at risk for a type of cancer), or can becollected from a healthy subject. The term subject can be usedinterchangeably with patient. The patient can be a human patient. Theone or a plurality of classifier biomarkers that can make up anexpression profile as provided herein can be selected from one or morebiomarkers of Table 1 and/or any additional set of biomarker classifiersdisclosed herein.

As used herein, the term “determining an expression level” or“determining an expression profile” or “detecting an expression level”or “detecting an expression profile” as used in reference to a biomarkeror classifier can mean the application of a biomarker specific reagentsuch as a probe, primer or antibody and/or a method applied to a sample,for example a sample of the subject or patient and/or a control sample,for ascertaining or measuring quantitatively, semi-quantitatively orqualitatively the amount of a biomarker or biomarkers, for example theamount of biomarker polypeptide or mRNA (or cDNA derived therefrom). Thelevel of a biomarker as provided herein can be determined by any numberof methods known in the art and/or provided herein. The methods caninclude for example immunoassays including for exampleimmunohistochemistry, ELISA, Western blot, immunoprecipitation and thelike, where a biomarker detection agent such as an antibody for example,a labeled antibody, specifically binds the biomarker and permits forexample relative or absolute ascertaining of the amount of polypeptidebiomarker, hybridization and PCR protocols where a probe or primer orprimer set are used to ascertain the amount of nucleic acid biomarker,including for example probe based and amplification based methodsincluding for example microarray analysis, RT-PCR such as quantitativeRT-PCR (qRT-PCR), serial analysis of gene expression (SAGE), NorthernBlot, digital molecular barcoding technology, for example NanostringCounter Analysis, and TaqMan quantitative PCR assays. Other methods ofmRNA detection and quantification can be applied, such as mRNA in situhybridization in formalin-fixed, paraffin-embedded (FFPE) tissue samplesor cells. This technology is currently offered by the QuantiGene ViewRNA(Affymetrix), which uses probe sets for each mRNA that bind specificallyto an amplification system to amplify the hybridization signals; theseamplified signals can be visualized using a standard fluorescencemicroscope or imaging system. This system for example can detect andmeasure transcript levels in heterogeneous samples; for example, if asample has normal and tumor cells present in the same tissue section. Asmentioned, TaqMan probe-based gene expression analysis (PCR-based) canalso be used for measuring gene expression levels in tissue samples, andthis technology has been shown to be useful for measuring mRNA levels inFFPE samples. In brief, TaqMan probe-based assays utilize a probe thathybridizes specifically to the mRNA target. This probe contains aquencher dye and a reporter dye (fluorescent molecule) attached to eachend, and fluorescence is emitted only when specific hybridization to themRNA target occurs. During the amplification step, the exonucleaseactivity of the polymerase enzyme causes the quencher and the reporterdyes to be detached from the probe, and fluorescence emission can occur.This fluorescence emission is recorded and signals are measured by adetection system; these signal intensities are used to calculate theabundance of a given transcript (gene expression) in a sample.

In one embodiment, the “expression profile” or a “biomarker profile” or“gene signature” associated with the classifier biomarkers describedherein (e.g., Table 1 and/or any additional set of biomarker classifiersas disclosed herein) can be useful for distinguishing between normal andtumor samples. In another embodiment, the tumor samples are one type ofcancer as determined based on tissue of origin. The one type of cancercan be any type of cancer known in the art and/or provided herein. Inanother embodiment, the cancer can be further classified as a specificclustering of cluster assignment (COCA) subtype based upon an expressionprofile of one or more classifier biomarkers (e.g., Table 1) determinedusing the methods provided herein. The specific COCA subtype can be anyCOCA subtype as described in Hoadley, Katherine A., Christina Yau,Toshinori Hinoue, Denise M. Wolf, Alexander J. Lazar, Esther Drill,Ronglai Shen et al. “Cell-of-origin patterns dominate the molecularclassification of 10,000 tumors from 33 types of cancer.” Cell173, no. 2(2018): 291-304. In one embodiment, the specific COCA subtype can beselected from C1 ACC/PCPG, C2 GBM/LGG, C3 OV, C4 Squamous-like, C6LUAD-Enriched, C8 PAAD/some STAD, C9 UCS, C10 BRCA/Basal, C12 UCEC, C14PRAD, C15 CESC (subset of cervical), C16 BLCA, C17 TGCT, C19 COAD/READ,C20 SARC/MESO, C21 KIRK/KICH/KIRP, C22 Liver, C24 BRCA/Luminal, C25THYM, C26 SKCM/UVM and C28 THCA. Expression profiles using theclassifier biomarkers disclosed herein (e.g., Table 1, Table 2 and anyadditional set of biomarker classifiers as disclosed herein) can providevaluable molecular tools for specifically identifying COCA subtypes, andfor treating a cancer based on its COCA subtype. Accordingly, providedherein are methods for screening and classifying a subject forpan-cancer COCA subtypes.

In some instances, a single classifier biomarker or a plurality ofclassifier biomarkers provided herein (e.g., from Table 1) is capable ofidentifying COCA subtypes of cancer with a predictive success of atleast about 70%, at least about 71%, at least about 72%, at least about73%, at least about 74%, at least about 75%, at least about 76%, atleast about 77%, at least about 78%, at 1 east about 79%, at least about80%, at least about 81%, at least about 82%, at least about 83%, atleast about 84%, at least about 85%, at least about 86%, at least about87%, at least about 88%, at least about 89%, at least about 90%, atleast about 91%, at least about 92%, at least about 93%, at least about94%, at least about 95%, at least about 96%, at least about 97%, atleast about 98%, at least about 99%, up to 100%, inclusive of all rangesand subranges therebetween.

In some instances, a single classifier biomarker or a plurality ofclassifier biomarkers as provided herein (e.g., from Table 1) is capableof determining COCA subtypes of cancer with a sensitivity or specificityof at least about 70%, at least about 71%, at least about 72%, at leastabout 73%, at least about 74%, at least about 75%, at least about 76%,at least about 77%, at least about 78%, at 1 east about 79%, at leastabout 80%, at least about 81%, at least about 82%, at least about 83%,at least about 84%, at least about 85%, at least about 86%, at leastabout 87%, at least about 88%, at least about 89%, at least about 90%,at least about 91%, at least about 92%, at least about 93%, at leastabout 94%, at least about 95%, at least about 96%, at least about 97%,at least about 98%, at least about 99%, up to 100%, inclusive of allranges and subranges therebetween.

Also encompassed herein is a system capable of distinguishing variousCOCA subtypes of cancer not detectable using current methods. Thissystem can b e capable of processing a large number of subjects andsubject variables such as expression profiles and other diagnosticcriteria. In one embodiment, the methods for determining a COCA subtypeas provided herein using one or a plurality of classifier biomarkers asprovided herein (e.g., Table 1) can be part of system capable ofdistinguishing various COCA subtypes that also utilizes data accumulatedfrom other diagnostic methods. The other diagnostic methods can includeadditional genome-wide molecular assays or platforms, histochemical,immunohistochemical, cytologic, immunocytologic, visual diagnosticmethods including histologic or morphometric evaluation of cancer ortumor tissue or any combination thereof. The additional genome-widemolecular assays or platforms can be selected from whole-exome DNAsequencing assays (e.g., Illumina HiSeq and GAID, DNA copy-numbervariation assays (e.g., Affymetrix 6.0 microarrays), DNA methylationassays (e.g., Illumina 450,000-feature microarrays), genome-wide mRNAlevel assays (e.g., Illumina mRNA-seq), microRNA level assays (e.g.,Illumina microRNA-seq), and protein level assays for proteins and/orphosphorylated proteins (e.g., Reverse Phase Protein Arrays; RPPA).

In various embodiments, the expression profile derived from a subject(e.g., from a sample obtained from said subject) is compared to areference expression profile. A “reference expression profile” or“control expression profile” can be a profile derived from the subjectprior to treatment or therapy; can be a profile produced from thesubject sample at a particular time point (usually prior to or followingtreatment or therapy, but can also include a particular time point priorto or following diagnosis of a type of cancer); or can be derived from ahealthy individual or a pooled reference from healthy individuals. Areference expression profile can be specific to different C O C Asubtypes of cancer. The COCA reference expression profile can be fromany tissues from which a specific COCA has been found. As providedherein, in one embodiment, the specific COCA subtype can be any COCAsubtype as described in Hoadley, Katherine A., Christina Yau, ToshinoriHinoue, Denise M. Wolf, Alexander J. Lazar, Esther Drill, Ronglai Shenet al. “Cell-of-origin patterns dominate the molecular classification of10,000 tumors from 33 types of cancer.” Cell173, no. 2 (2018): 291-304.In one embodiment, the specific COCA subtype can be selected from a C1ACC/PCPG, C2 GBM/LGG, C3 OV, C4 Squamous-like, C6 LUAD-Enriched, C8PAAD/some STAD, C9 UCS, C10 BRCA/Basal, C12 UCEC, C14 PRAD, C15 CESC(subset of cervical), C16 BLCA, C17 TGCT, C19 COAD/READ, C20 SARC/MESO,C21 KIRK/KICH/KIRP, C22 Liver, C24 BRCA/Luminal, C25 THYM, C26 SKCM/UVMor C28 THCA COCA subtype.

The reference expression profile can be compared to a test expressionprofile or vice versa. A “test expression profile” can be derived fromthe same subject as the reference expression profile except at asubsequent time point (e.g., one or more days, weeks or months followingcollection of the reference expression profile) or can be derived from adifferent subject. In summary, any test expression profile of a subjectcan be compared to a previously collected profile from a subject thathas a specific COCA subtype. The specific COCA subtype can be any COCAsubtype as described in Hoadley, Katherine A., Christina Yau, ToshinoriHinoue, Denise M. Wolf, Alexander J. Lazar, Esther Drill, Ronglai Shenet al. “Cell-of-origin patterns dominate the molecular classification of10,000 tumors from 33 types of cancer.” Cell173, no. 2 (2018): 291-304.In one embodiment, the specific COCA subtype can be selected from a C1ACC/PCPG, C2 GBM/LGG, C3 OV, C4 Squamous-like, C6 LUAD-Enriched, C8PAAD/some STAD, C9 UCS, C10 BRCA/Basal, C12 UCEC, C14 PRAD, C15 CESC(subset of cervical), C16 BLCA, C17 TGCT, C19 COAD/READ, C20 SARC/MESO,C21 KIRK/KICH/KIRP, C22 Liver, C24 BRCA/Luminal, C25 THYM, C26 SKCM/UVMor C28 THCA COCA subtype.

The classifier biomarkers provided herein (e.g., Table 1) for use in themethods, compositions or kits provided herein can include nucleic acids(RNA, cDNA, and DNA) and proteins, and variants and fragments thereof.Such biomarkers can include DNA comprising the entire or partialsequence of the nucleic acid sequence encoding the biomarker, or thecomplement of such a sequence. The biomarkers described herein caninclude RNA comprising the entire or partial sequence of any of thenucleic acid sequences of interest, or their non-natural cDNA products,obtained synthetically in vitro in a reverse transcription reaction. Thebiomarker nucleic acids can also include any expression product orportion thereof of the nucleic acid sequences of interest. A biomarkerprotein can be a protein encoded by or corresponding to a DNA biomarkerprovided herein. A biomarker protein can comprise the entire or partialamino acid sequence of any of the biomarker proteins or polypeptides.The biomarker nucleic acid can be extracted from a bodily fluid (e.g.,blood or fractions thereof, urine, saliva, CSF, etc.), a cell or can becell free or extracted from an extracellular vesicular entity such as anexosome.

A “classifier biomarker” or “biomarker” or “classifier gene” can be anynucleic acid (DNA, RNA or cDNA) or protein whose level of expression ina tissue or cell is altered compared to that of a normal or healthy cellor tissue or any other reference or control as provided herein. Forexample, a “classifier biomarker” or “biomarker” or “classifier gene”can be any nucleic acid (DNA, RNA or cDNA) or protein whose level ofexpression in a tissue or cell is altered in a specific COCA subtype.The detection of the biomarkers provided herein can permit thedetermination of the specific COCA subtype. The “classifier biomarker”or “biomarker” or “classifier gene” may be one that is up-regulated(e.g. expression is increased) or down-regulated (e.g. expression isdecreased) relative to a reference or control as provided herein. Thereference or control can be any reference or control as provided herein.In some embodiments, the expression values of nucleic acids (DNA, RNA orcDNA) that are up-regulated or down-regulated in a particular C O C Asubtype of cancer can be pooled into one gene signature. The overallexpression level in each gene signature is referred to herein as the“‘expression profile” and is used to classify a test sample (i.e., asample obtained from a subject suffering from or suspected of sufferingfrom cancer) according to the COCA subtype of cancer. However, it isunderstood that independent evaluation of expression for each of thegenes disclosed herein can be used to classify tumor subtypes withoutthe need to group up-regulated and down-regulated genes into one or moregene signatures. In some cases, as shown in Tables 1 and 2, a total of84 biomarkers can be used for COCA subtype determination. For a specificCOCA subtype, for example, expression of 4 of the 84 biomarkers of Table1 can have altered expression that is correlated therewith. Further, thecorrelation of the 4 of the 84 biomarkers of Table 1 with the specificCOCA subtype can be positive, negative or a combination thereof.

The classifier biomarkers for use in the methods provided herein caninclude any nucleic acid (DNA, RNA or cDNA) or protein that isselectively expressed in COCA subtypes of cancer, as defined hereinabove. Sample biomarker genes are listed in Table 1 below.

In one embodiment, the 84-gene gene signature for COCA subtyping isfound in Table 1. The relative gene expression levels as represented bynearest centroid coefficients of the classifier biomarkers for the84-gene pan-cancer subtyper of Table 1 are shown in Table 2.

TABLE 1 84 Gene Classifier Biomarker Signature for Pan-Cancer COCAsubtyping. GenBank SEQ Gene Accession ID NO. Symbol Gene Name Number* 1A1BG Alpha-1-B NM_130786.3 Glycoprotein 2 ACPP Acid Phosphatase,NM_001099.5 Prostate 3 APC2 APC2, WNT Signaling NM_001351273.1 PathwayRegulator 4 AQP5 Aquaporin 5 NM_001651.4 5 ASGR1 asialoglycoproteinNM_001671.5 receptor 1 6 BCAN brevican NM_021948.5 7 BCL2L15 BCL2 like15 NM_001010922.3 8 C1orf172 keratinocyte NM_152365.3 differentiationfactor 1 9 CAPS calcyphosine NM_004058.5 10 CBLC Cbl proto- NM_012116.4oncogene C 11 CDH1 cadherin 1 NM_004360.5 12 CEACAM5 carcinoembryonicNM_004363.5 antigen related cell adhesion molecule 5 13 CEACAM6carcinoembryonic NM_002483.7 antigen related cell adhesion molecule 6 14CHMP4C multivesicular NM_152284.4 body protein 4C 15 CLCA2 chloridechannel NM_006536.7 accessory 2 16 CLDN4 claudin 4 NM_001305.4 17COL11A2 collagen type NM_080680.2 XI alpha 2 chain 18 CRB3 crumbs cellNM_139161.5 polarity complex component 3 19 CTSE cathepsin E NM_001910.420 CUBN cubilin NM_001081.3 21 CYP2B7P1 cytochrome P450 NR_001278.1family 2 subfamily B member 7, pseudogene 22 DLX5 distal-less homeobox 5NM_005221.6 23 DMGDH dimethylglycine NM_013391.3 dehydrogenase 24 ELF3E74 like ETS NM_004433.5 transcription factor 3 25 EMX2 empty spiraclesNM_004098.4 homeobox 2 26 EMX2OS EMX2 opposite NR_002791.2strand/antisense RNA 27 EPCAM epithelial cell NM_002354.2 adhesionmolecule 28 ERBB3 erb-b2 receptor NM_001982.3 tyrosine kinase 3 29 ESR1estrogen receptor 1 NM_000125.3 30 FAM171A2 family with sequenceNM_198475.2 similarity 171 member A2 31 FOLH1 folate hydrolase 1NM_004476.3 32 GABRP gamma-aminobutyric NM_014211.3 acid type A receptorpi subunit 33 GATA3 GATA binding protein 3 NM_001002295.2 34 GCNT3glucosaminyl (N-acetyl) NM_004751.3 transferase 3, mucin type 35 GPC2glypican 2 NM_152742.3 36 GPR35 G protein-coupled NM_001195381.1receptor 35 37 GPRC5A G protein-coupled NM_003979.3 receptor class Cgroup 5 member A 38 GRHL2 grainyhead like NM_024915.3 transcriptionfactor 2 39 HNF1A HNF1 homeobox A NM_000545.6 40 HPX hemopexinNM_000613.3 41 IYD iodotyrosine NM_203395.2 deiodinase 42 KRT18 keratin18 NM_000224.3 43 KRT6A keratin 6A NM_005554.4 44 KRT6B keratin 6BNM_005555.4 45 KRT81 keratin 81 NM_002281.3 46 KRT8 keratin 8NM_002273.3 47 LAD1 ladinin 1 NM_005558.3 48 LCK LCK proto-oncogene,NM_005356.5 Src family tyrosine kinase 49 LGALS4 galectin 4 NM_006149.450 LYPD1 LY6/PLAUR domain NM_144586.6 containing 1 51 MARVELD3 MARVELdomain NM_052858.5 containing 3 52 MEG3 maternally NR_046473.1 expressed3 53 MUC13 mucin 13, cell NM_033049.4 surface associated 54 MUC16 mucin16, cell NM_024690.2 surface associated 55 MUC4 mucin 4, cellNM_018406.7 surface associated 56 MYCN MYCN proto-oncogene, NM_005378.6bHLH transcription factor 57 NAPSA napsin A aspartic NM_004851.3peptidase 58 NKX3-1 NK3 homeobox 1 NM_006167.4 59 NPR1 natriureticNM_000906.4 peptide receptor 1 60 PAX8 paired box 8 NM_003466.4 61 PRAMEpreferentially NM_206956.3 expressed antigen in melanoma 62 PSCAprostate stem NM_005672.5 cell antigen 63 PVRL4 nectin cell NM_030916.3adhesion molecule 4 64 S100P calcium binding NM_005980.3 protein P 65SALL4 spalt like NM_020436.5 transcription factor 4 66 SFTPD surfactantprotein D NM_003019.5 67 SILV premelanosome NM_006928.4 protein 68 SIT1signaling threshold NM_014450.3 regulating transmem- brane adaptor 1 69SLC26A4 solute carrier NM_000441.1 family 26 member 4 70 SLC3A1 solutecarrier NM_000341.3 family 3 member 1 71 SLC45A3 solute carrierNM_033102.3 family 45 member 3 72 SOX17 SRY-box 17 NM_022454.4 73 SPDEFSAM pointed domain NM_012391.3 containing ETS transcription factor 74SPINT2 serine peptidase NM_021102.4 inhibitor, Kunitz type 2 75 TCEAL5transcription NM_001012979.3 elongation factor A like 5 76 TGthyroglobulin NM_003235.5 77 TMEM27 collectrin, amino NM_020665.6 acidtransport regulator 78 TP63 tumor protein p63 NM_003722.5 79 TRPS1transcriptional NM_001330599.1 repressor GATA binding 1 80 TSPAN8tetraspanin 8 NM_004616.3 81 UPK3B uroplakin 3B NM_001347684.1 82 VTNvitronectin NM_000638.4 83 ZNF578 zinc finger NM_001099694.2 protein 57884 ZNF695 zinc finger NM_020394.5 protein 695 *Each GenBank AccessionNumber is a representative or exemplary GenBank Accession Number for thelisted gene and is herein incorporated by reference in its entirety forall purposes. Further, each listed representative or exemplary accessionnumber should not be construed to limit the claims to the specificaccession number.

TABLE 2 Nearest centroid classifier coefficients of 84 Gene ClassifierBiomarker Signature for Pan-Cancer COCA subtyping. C4 C6 C8 Gene C1 C2C3 (Squamous- (LUAD (PAAD/some C9 # Symbol (ACC/PCPG) (GBM/LGG) (OV)like) enriched) STAD) (UCS) 1 A1BG 1.591560699 0.190424 0.486501004−0.428197254 0.412635759 −0.184279 2.001627705 2 ACPP −3.165733781−3.12929 1.422642856 1.55810748 1.220541761 0.3613543 −0.523609534 3APC2 5.927921166 9.535164 −0.596869926 0.426709368 −0.2355502480.211678867 0.296218394 4 AQP5 0.913265915 −1.5756 6.0771996180.038435948 3.968116521 5.439757901 4.689537595 5 ASGR1 0.2003829412.23723 −0.270715575 −0.385421722 0.9311113 0.377002269 1.767020071 6BCAN 3.407338299 11.97624 −1.053982755 −0.662093738 −0.7291790330.649031299 0.667042943 7 BCL2L15 −0.658510708 0.077946 0.865164587−0.382856173 4.273114175 5.648586443 −0.38038599 8 C1orf172 −7.367511726−8.11012 0.639328401 0.482516142 0.16674242 −0.401651641 −2.035170988 9CAPS −0.328076695 −0.34918 2.841784698 −1.13544035 1.0752576290.923666447 0.259472216 10 CBLC −8.155167351 −8.15517 0.5593632991.283503926 0.054904876 0.803955968 −1.909451133 11 CDH1 −11.31993378−6.63507 −0.611897157 0.320830787 0.259254437 0.32354087 −2.417624306 12CEACAM5 −4.263447619 −4.26345 −2.162761329 5.453807243 8.010400428.25154716 −1.643025427 13 CEACAM6 −6.692665202 −6.69267 −1.4110843643.660636032 8.117243572 7.181029674 −2.853743681 14 CHMP4C −4.851564145−6.7881 0.477705202 0.092903499 0.246039693 0.425978288 −1.619930594 15CLCA2 −1.916026013 −0.55212 −2.135469493 10.56468928 0.196047759−0.982688048 0.230995571 16 CLDN4 −7.769800248 −8.52437 1.293661429−0.21590352 0.923225733 0.698989001 −2.196157068 17 COL11A2 0.9947197263.794411 0.981227344 −0.355884432 −0.290171902 −0.134024256 5.74700619318 CRB3 −6.855921321 −6.69387 0.314038459 −0.051924366 0.425966280.230134355 −2.12978152 19 CTSE −2.769179309 −2.76918 0.690060563−0.068850571 8.211338748 10.27849106 −1.795164025 20 CUBN 1.4175950671.109969 −1.06751464 −1.098119051 0.200954954 0.214406457 1.001364945 21CYP2B7P1 −0.494004573 0.493388 −0.20904464 −0.38882331 7.0202402421.642138931 0.554846851 22 DLX5 0.646764837 −0.46515 −0.5434892193.043312263 −1.037762982 −0.615048606 3.664621768 23 DMGDH 1.359832881.246526 −0.621678415 −1.957477961 0.403167757 0.050041264 −0.04215126824 ELF3 −9.499613685 −8.35834 1.621031579 0.270463158 1.2619783641.664172061 −1.774337463 25 EMX2 −1.445515678 4.196057 7.930390253−0.743521137 −3.08024833 −1.20337606 5.823681705 26 EMX2OS −1.405999534.680959 6.731039756 −1.182024547 −3.142205278 −1.044827924 5.39941381927 EPCAM −4.140528206 −8.15621 1.301641135 −0.969949548 1.3499819211.210777667 −0.808127169 28 ERBB3 −6.795539466 −2.20761 −0.171425783−0.850309305 0.543205553 0.488891046 −1.571378197 29 ESR1 −1.563757872−2.7205 4.824366357 −0.395784527 0.80814913 −0.646508959 −0.041535695 30FAM171A2 2.31912146 3.133851 2.782191887 −0.20635555 0.23640564−0.553185081 4.13232199 31 FOLH1 0.35530613 1.32629 0.070865602−0.437336881 −0.404293953 −0.516227658 0.984459412 32 GABRP −3.114382282−3.80051 2.138616034 2.054140007 −0.13455569 5.495459292 0.876924899 33GATA3 3.645314335 −4.46319 1.041365419 0.137891422 −0.309397160.245236202 −0.485018337 34 GCNT3 −3.715677872 −4.43208 −0.4252190531.24405515 3.332011974 6.140028092 −2.627063776 35 GPC2 0.3276817143.748559 0.567306982 0.383564177 0.020301437 −0.493055636 4.27187925 36GPR35 −1.123275158 0.288748 0.479762937 −0.401845781 0.6123774823.964562506 1.02539215 37 GPRC5A −5.029113731 −6.47264 −1.039887690.471927094 2.504432044 2.440959491 −1.902607145 38 GRHL2 −9.186320721−9.01009 0.20905294 0.759772508 0.222957527 −0.501711577 −2.025188838 39HNF1A −0.226398309 0.326606 −0.566429337 −0.634513541 −0.0563972833.532952512 0.664824374 40 HPX 0.285105569 −0.08725 −0.761181725−0.99545754 0.077064824 0.031571949 0.655698572 41 IYD −3.48501457−3.48501 −2.700731814 −1.942508868 2.764256302 4.516203307 −2.33816387442 KRT18 −6.139551755 −9.93225 0.824379787 −1.150398992 0.3661931690.669054426 −1.559549225 43 KRT6A −3.978012535 −3.97801 3.98570515313.04837042 1.834983617 2.826759232 0.495449701 44 KRT6B −3.679513879−3.67951 −0.284781052 10.86983161 −0.095874456 4.031758881 −0.31965540745 KRT81 −1.52156723 −2.24431 1.219674513 0.808360415 0.6852874950.157557306 1.332452992 46 KRT8 −9.333378281 −12.0127 0.923200159−0.888982445 0.25864778 1.104030262 −1.049517897 47 LAD1 −9.54391772−9.93659 0.152727678 2.525900274 1.096406409 2.02745431 −1.338565911 48LCK −2.653249024 −4.15782 −0.449932121 0.595456972 1.1148731691.294851611 −0.905250193 49 LGALS4 −1.069860082 −0.88856 −0.804776299−0.636711502 0.902648195 9.957840161 0.300844381 50 LYPD1 0.1613567154.620573 6.06218977 0.619464566 0.866852785 1.287462226 2.488687449 51MARVELD3 −6.499693064 −1.92762 −0.006137317 0.236649367 0.1105332070.419749056 −1.459194862 52 MEG3 6.987769361 4.00401 0.481443128−0.367973396 0.187829444 2.037867641 5.549924389 53 MUC13 −1.096164161−1.549 −0.857929123 −0.216081121 3.60927036 9.342999034 −0.087884353 54MUC16 −2.940429889 −2.94043 8.971152269 3.553030115 5.9227590272.391002348 2.159619147 55 MUC4 −2.659938287 −1.76141 1.0139378994.360400601 6.293886393 4.455995593 0.8936279 56 MYCN 2.6350013513.722476 3.48370589 −0.476428956 −0.4996297 −0.139261692 4.259965299 57NAPSA −1.134449647 −0.6277 −0.350262749 0.121656227 11.53466505−0.206023725 −0.551300842 58 NKX3-1 0.643122217 −0.8988 −1.0129282670.87446207 0.131533121 0.091873999 −0.013147747 59 NPR1 1.562673445−1.70035 4.826134025 −0.898468302 0.426172792 0.791217063 0.909336394 60PAX8 −1.207977403 −2.70163 6.131772035 −0.109570392 −0.5075680170.425575927 0.567460764 61 PRAME −2.720513358 −3.0116 7.4177380654.107800576 1.634537806 −0.732218434 8.835740874 62 PSCA −2.62692522−1.51466 1.088832807 2.403172672 0.853737331 6.468730165 −0.867506614 63PVRL4 −7.123332103 −7.84158 0.298515276 2.072952358 1.113959640.537458726 −2.175556637 64 S100P −4.176354266 −4.39339 −2.0397088392.60206496 3.54339421 5.588006332 −0.835327459 65 SALL4 −0.350139755−1.98283 0.992610931 0.242440795 0.631851425 1.50741502 2.172219002 66SFTPD 1.229072592 0.156327 1.095837733 0.844728495 6.616682345−0.138899031 −0.528759079 67 SILV −1.601355906 −3.16131 0.436716938−0.17041022 0.318281992 0.27070394 0.049533868 68 SIT1 −2.339171989−3.35217 −0.160872017 0.621892503 1.468402409 0.74432008 −1.226672618 69SLC26A4 −0.01413008 0.420072 −1.783069972 −0.415510022 0.74827946−0.100896769 −0.464161483 70 SLC3A1 1.225854746 0.996711 −0.32436489−1.197295399 −0.603148658 5.542018189 0.717943528 71 SLC45A3−2.005759994 −0.41185 −0.428424528 −0.241045139 0.557739049 1.7262909530.128310421 72 SOX17 0.824116164 −0.22888 6.125476978 −1.080390132−0.166935984 0.604218197 1.286142427 73 SPDEF −2.615781968 −2.03453.94966981 −0.755535616 4.925193925 4.243866593 0.699764031 74 SPINT2−2.997432839 −4.83916 1.007795827 0.294659358 0.15758716 0.166037725−0.979250422 75 TCEAL5 4.349410995 5.379822 1.642558611 −0.898540151−0.528496105 0.788010053 4.759050684 76 TG 2.696748103 −0.10465−1.217878931 0.390892921 −0.793389805 −0.297668711 1.286415669 77 TMEM27−0.42619294 −0.29365 −0.1091435 −0.496636878 0.747255703 0.386605101−0.460703305 78 TP63 −2.443322255 −2.69429 −1.072539715 8.0797730171.080093521 −0.122917429 0.715521461 79 TRPS1 −0.827302587 0.827571.115573024 0.379838983 −0.553191739 −0.163032265 1.067422295 80 TSPAN8−1.517176876 −1.38543 1.264805902 −0.971215985 4.123187886 8.1201192831.88608684 81 UPK3B −1.800031107 −1.79259 6.496391778 2.5914651891.916362767 1.31370249 1.253864936 82 VTN 4.532732542 0.962046−0.35391519 −0.827839727 0.374371855 3.646375202 0.21090879 83 ZNF5781.940365745 2.606116 1.274215935 −2.128937852 −0.532541888 −0.121768261.417239081 84 ZNF695 −2.395893789 −0.97465 2.29727236 1.015039672−0.170693901 −0.682198412 2.909908974 Gene C10 C12 C14 C15 C16 C17 C19 #Symbol (BRCA/Basal) (UCEC) (PAAD) (CESC) (BLCA) (TGCT) (COAD/READ) 1A1BG 0.142304769 −0.093163359 −1.141696682 −1.152290675 −1.297400420.256130124 −1.788698924 2 ACPP −1.398401725 −0.082101813 10.450647241.42121 0.266257162 0.477828173 0.992457174 3 APC2 −0.572616388−0.977273763 0.32549264 0.054659498 0.405219188 1.45040744 0.15536217 4AQP5 3.702869943 6.247684679 −1.136288477 8.250686508 −2.4131795162.019598373 −0.061400955 5 ASGR1 −0.374333329 −0.307671908 −1.422479203−0.69544287 −0.284590427 0.811900301 0.284656903 6 BCAN −0.843138597−0.389425497 −2.101665722 0.113052513 −0.411742591 2.4511025371.398031108 7 BCL2L15 −0.509343675 2.21556825 −0.416331352 5.642847062−0.076320616 −0.539990486 5.956925607 8 C1orf172 0.377167732 0.5410787440.28000935 0.529249282 0.78721152 −0.331669012 0.205165485 9 CAPS1.236614303 3.401999923 1.140742546 3.129590389 3.356392461 −1.598575877−1.041036365 10 CBLC 0.528896861 0.860866464 0.983689368 1.3668589652.004650922 −1.590854285 1.462118274 11 CDH1 0.147729378 0.0165066310.816575199 0.187211685 0.578678097 −1.440668807 0.796827631 12 CEACAM5−0.578531195 −0.226926702 0.223892364 8.216176042 2.697562315−3.025546808 11.26786682 13 CEACAM6 0.378171192 −0.861664584−1.150176451 5.321201782 1.780316958 0.990910873 7.545948521 14 CHMP4C0.932335566 0.332852097 0.654936052 −0.061505461 0.701272543−2.076344669 0.955706036 15 CLCA2 1.252518338 −0.870513832 0.8349940650.083375317 5.899414972 0.277783481 −1.434642506 16 CLDN4 0.5965060640.783384706 0.695706592 0.872264084 1.666131843 −4.130304775 1.50630759417 COL11A2 1.152615114 1.094477267 −0.512589565 0.34882784 −0.2770124970.921823497 −0.972050734 18 CRB3 −0.252190917 0.447730499 0.3142621730.485822293 0.422819651 −1.015012817 0.63102577 19 CTSE −1.579272399−0.27402142 −0.123593233 4.329326574 3.964690014 0.131319517 6.50985151620 CUBN −0.042341755 0.692484506 −0.083311834 −1.016397158 −1.291707376−0.060942034 −1.2605976 21 CYP2B7P1 −0.793782659 −1.135804901−0.048029795 2.920741779 −1.769335752 −1.106667891 0.57204011 22 DLX51.065680224 6.63897818 1.203219318 0.244733336 2.095413211 −0.874548488−2.278703058 23 DMGDH −1.016515828 −0.716060074 0.747670367 −1.492260644−2.335607422 −0.511890494 −1.983626926 24 ELF3 0.867636291 1.089943222−0.33693125 2.161246514 1.979945601 −2.024525261 1.906780054 25 EMX20.398485049 7.622270699 −1.139800617 3.497783132 2.434679878−0.070065196 −2.337175682 26 EMX2OS 0.35047227 6.672513444 −0.7711694692.651126602 1.918119816 0.003917034 −2.749389229 27 EPCAM 1.0709590881.65345969 0.656584363 1.491420181 −0.23366156 0.118017267 2.50274430528 ERBB3 0.156681689 −0.354419176 0.85480606 0.70559009 0.245664699−1.125976575 1.142628311 29 ESR1 −0.114307986 4.969542469 1.6039822832.334162784 −0.866991852 −1.616321139 −2.643912362 30 FAM171A21.110078033 2.742226868 −0.598712372 1.23539718 0.196442464 2.52966238−2.299101465 31 FOLH1 1.342626401 1.95199662 7.506581427 −1.68931756−1.160417237 −1.194199537 −0.990382338 32 GABRP 8.062957771 3.4976052483.292130436 7.257298478 −1.104614063 −2.409859347 1.866998203 33 GATA32.883744656 −1.322536343 0.362300268 0.172939031 5.863126341 0.347938903−1.305257494 34 GCNT3 −2.229182519 1.453042898 −1.612817806 4.45653155−2.101854264 −1.198296139 5.978745374 35 GPC2 1.844529239 1.87662419−0.31412767 −0.281008532 1.248099969 3.603718475 −0.446098297 36 GPR35−0.201147094 1.145430196 −0.633581001 3.403228537 −0.0749260630.037914492 4.997511501 37 GPRC5A 0.136594075 −0.047037042 −1.7293891571.713890286 0.947387044 1.099444707 2.341085357 38 GRHL2 0.8879328510.608466746 1.846078681 0.587744821 0.927526179 −4.112841993 0.29640305339 HNF1A −0.757149534 0.998266857 −0.224121716 3.083804118 −0.876897763−0.071449343 4.816960704 40 HPX 0.083502266 0.467027224 1.898436273−0.011129604 −0.443467655 1.739428473 −0.713289667 41 IYD −1.986600368−1.163865741 1.126629676 1.43730223 0.098371764 −3.48501457 5.82117450642 KRT18 −0.774406938 0.534341861 0.455271746 1.038944818 0.900515088−0.471025112 1.117607068 43 KRT6A 3.285148104 2.483810663 −0.0732423024.424176683 4.698929835 −3.375318162 0.482454829 44 KRT6B 6.929849448−0.029392703 −0.881504967 3.541676869 2.041513217 −3.6795138792.619439855 45 KRT81 3.704809399 1.125852117 −1.180411884 1.855047045−0.224822743 −1.682654755 −1.136304837 46 KRT8 0.117987473 0.541422454−0.071910723 1.371770589 1.338626534 −0.317434922 1.536661174 47 LAD11.117718225 0.900829355 −0.184554726 1.845819432 2.28378527 −1.02947412.021237694 48 LCK 0.323828061 −0.2698455 −0.135093489 0.809447978−0.427656591 1.876095595 0.766125516 49 LGALS4 −1.049805056 −0.6584452910.33823429 2.270150332 1.671819995 1.025358005 10.63263886 50 LYPD10.318704537 2.561660067 −1.46291243 0.106783622 −0.846380974 0.579452651−1.457927861 51 MARVELD3 0.594064846 0.59864298 0.746088507 0.6732201780.379141989 −1.349340309 0.960362538 52 MEG3 −0.760697048 −0.5595062460.124435896 −0.790286218 0.176547542 7.212428882 −0.102624496 53 MUC13−0.415482063 2.373448458 3.347855433 9.337472272 −1.260341969−1.137285921 11.16914163 54 MUC16 5.749271478 7.257302838 −0.3801175499.47128499 −0.667829704 2.503910209 −1.68279375 55 MUC4 −0.5336492891.580638241 1.704056599 7.6185769 1.358543899 3.416007758 4.848907641 56MYCN 1.167011131 1.749261819 −3.028046397 0.244478213 0.6931710896.57493164 0.586544197 57 NAPSA −0.568068159 0.272590794 −0.2676503470.491103798 −0.068387887 0.784909839 −0.516593433 58 NKX3-1 0.636142588−0.165152927 8.791444726 1.213527947 0.111824365 0.401176282−1.098810432 59 NPR1 −0.108559874 −0.014674711 −0.783482295 −1.047763786−1.086165265 −0.51156436 −1.013472417 60 PAX8 −1.679043287 6.175626455−1.300676688 3.829761838 −0.264941996 0.019241642 0.199759138 61 PRAME5.753805128 8.637405593 −1.641025038 2.240303364 −1.86413095 6.2493240240.732324816 62 PSCA 2.380934424 0.822962284 5.024448953 5.0745902379.409030075 −1.378514517 0.823284273 63 PVRL4 1.983169736 0.543307960.274629626 1.209370824 2.956273849 −1.709650966 −0.443057331 64 S100P3.387292842 0.722588023 0.264728445 6.102225171 7.587401555 −0.7450100366.222215668 65 SALL4 −0.212011363 −0.183118884 −0.428196415 1.7366976091.763666611 6.22003894 1.160887388 66 SFTPD −0.864741869 0.187431990.85637944 −0.169964303 −1.953003098 0.810342471 −3.059355397 67 SILV−0.445152541 0.023561013 −0.906600385 1.276911935 0.671250555−0.419750375 0.364343543 68 SIT1 0.404484797 −0.245179126 −0.698370063−0.228329389 −0.700949754 1.862224753 0.160038551 69 SLC26A4−0.242562803 −0.736362299 4.542023228 0.049925703 −1.244893781−1.393279547 −0.773795962 70 SLC3A1 −0.811991759 1.503175223 0.813376410.685335393 −0.830561409 0.02998455 4.234950007 71 SLC45A3 0.50351858−0.257430773 7.798304016 0.259038112 0.761165507 1.057588337 1.25301144472 SOX17 −0.436621464 6.590489885 0.258734252 0.190327448 −0.5370740634.1051736 −0.639722737 73 SPDEF 2.428928058 4.878948809 9.3962000855.896810841 0.785079512 −1.183433789 3.426972043 74 SPINT2 0.1995531490.669655069 0.202002754 0.813009857 0.747461864 −0.67968284 0.22435266475 TCEAL5 −0.580215651 0.283236555 1.291973501 −0.314854034 −0.2076307491.239038231 −1.642797685 76 TG −1.043977276 −1.109552249 1.299196722−1.24118987 −0.588195791 −0.92548737 0.175292614 77 TMEM27 0.2481023590.030206129 0.611157159 −0.754191803 −0.750158332 1.251273614−1.235570332 78 TP63 1.282401189 −1.071684619 3.203409462 −1.131395726.245170227 −0.345850774 −2.304303836 79 TRPS1 3.153356243 1.2483823340.226726961 0.003939999 −2.35803211 −1.170459499 −1.794640325 80 TSPAN8−1.77985797 0.74692949 6.457395474 4.704465483 −0.385074926 −0.4355730018.97062099 81 UPK3B 0.43781618 1.273683944 −0.408516441 2.1519438127.898733759 −0.735658973 −0.139912799 82 VTN −1.022686104 −1.195484585−0.273276881 −1.471670906 0.728314508 3.181634913 −0.582351194 83 ZNF578−0.128482728 −0.291752675 0.60523606 −2.466912276 −1.1158587455.469695435 −1.515460348 84 ZNF695 2.994868611 2.77909196 −1.0751683442.286781576 1.644923103 3.609635955 1.952089723 C21 Gene C20 (KIRK/ C22C24 C25 C26 C28 # Symbol (SARC/MESO) KICH/KIRP) (Liver) (BRCA/Luminal)(THYM) (SKCM/UVM) (THCA) 1 A1BG 0.070192984 −0.940840591 8.7034135430.936369381 −1.706237879 1.914831468 0.703529067 2 ACPP −2.032541038−2.305500467 −4.40459131 −1.024439493 −2.066553063 −3.7414682990.699426376 3 APC2 0.552360013 −0.679752407 −0.062564922 −0.527174725−0.177801985 1.332087657 −1.406046015 4 AQP5 −1.419094969 −2.627502329−2.379706191 −0.280015058 0.67166607 −1.438430189 3.948999547 5 ASGR10.462676231 −0.369411855 9.174063668 −1.440166184 −0.048667321−1.316415138 1.003468362 6 BCAN −0.209671799 0.222422862 0.598814814−0.685014857 −2.971866954 7.441867064 −2.689173959 7 BCL2L15−0.946597411 0.308006633 −0.847993525 −0.339415371 0.44529515−1.396188006 −1.219332417 8 C1orf172 −7.714313141 −2.053892696−1.961055619 0.097896284 −1.121520119 −5.883858505 0.420348593 9 CAPS−0.499052174 −1.362494555 −0.691670099 0.575620059 0.228128949−0.326249828 1.883752499 10 CBLC −8.155167351 −3.648064102 0.2836660770.197383744 −8.155167351 −8.155167351 −4.683902455 11 CDH1 −7.229104847−1.902809546 −0.803408059 0.453092886 −1.794492653 −0.3806972541.07759058 12 CEACAM5 −4.263447619 −4.263447619 −4.263447619 3.43408992−4.263447619 −3.826859325 −3.766911393 13 CEACAM6 −6.003097658−6.22642964 −6.020466183 2.793954656 −5.749231419 −6.088214272−1.029763418 14 CHMP4C −6.433893852 −0.448980057 −0.4775651540.166274869 −3.416609246 −6.4170014 0.423295456 15 CLCA2 1.090083657−1.916425099 −2.448293094 2.423949605 0.511588318 −0.16703996−0.923829634 16 CLDN4 −6.877306455 −0.800472122 −4.850938463−0.062787487 −6.750496629 −8.258170243 1.193098579 17 COL11A20.407171494 −0.522028403 −1.162577547 −1.391989324 2.8221334464.012255459 0.998530701 18 CRB3 −6.899881762 0.409378715 0.082599609−0.202979549 −3.405323461 −5.927493049 0.442041688 19 CTSE −2.1042414310.366273475 −1.243260786 −2.250040026 −2.122740123 −1.8261957115.586773138 20 CUBN 0.713604848 7.281399905 −1.384529654 −0.1455367971.410829201 3.566340978 3.20938295 21 CYP2B7P1 −0.654635147 −1.4471769185.139802834 6.622628205 −1.379336644 −1.212199304 −0.964049728 22 DLX50.462490967 0.017699215 −3.265329736 −0.745686562 −2.117303943−0.829299281 −0.068606829 23 DMGDH 0.028856242 6.060504719 6.702437958−0.060898439 −0.46134749 −1.489004211 2.38214753 24 ELF3 −7.606527581−0.555341599 −0.334664939 0.302735788 −5.416983065 −8.778305188−1.61788093 25 EMX2 2.802771238 7.074822861 −3.08024833 1.035384246−1.557188268 −0.213597469 −1.488833995 26 EMX2OS 2.520378612 7.353718721−3.593640608 1.294980889 −1.893334068 −0.333268626 −1.117234841 27 EPCAM−8.94327619 −1.582907817 −6.179887918 0.150227096 −3.89203662−9.927578427 1.469515171 28 ERBB3 −7.397006842 0.362674201 0.6763726571.354976731 −7.751611174 1.393383686 −0.864197462 29 ESR1 −0.2042637710.353961842 −0.001186471 7.045755877 −0.917732083 −0.5673454551.658056187 30 FAM171A2 0.843521911 −1.147803639 −1.94488491 −0.67709113−0.913518602 −0.123166538 2.177241264 31 FOLH1 −0.285087483 2.1070206342.114399746 −0.44262496 −3.693628465 −1.90866141 −0.126630866 32 GABRP−2.303619388 −1.113604356 −2.60527593 2.880519078 −2.969804846−0.477412964 −2.451544871 33 GATA3 −0.479072713 −0.694342037 −2.420000977.039537351 2.449239776 −2.642455615 1.305924613 34 GCNT3 −3.3699322143.012084991 0.745283917 −3.199989256 0.677330491 −4.065793513−0.452261295 35 GPC2 −0.655639972 −1.345174813 −2.097850016 −0.5206439832.326152106 −0.721335705 −2.085120953 36 GPR35 −0.15384199 0.320806564−0.470378252 −0.642499168 1.115849375 −1.434965852 −1.068214302 37GPRC5A −1.978637673 −3.708321529 −7.617162596 1.869297678 −7.799087331−2.526958538 1.836642695 38 GRHL2 −8.760984555 −8.300732853 −8.413800760.977699908 −2.733003314 −9.346544321 −0.260490678 39 HNF1A −0.4600046485.003097445 5.740038488 −0.419983585 −0.254057708 −0.06916107−1.339783101 40 HPX 0.428825996 −0.644318213 12.34302566 4.13452407−0.715914255 −1.121106629 −0.927626931 41 IYD −3.48501457 1.833595125.111462206 0.63980155 −3.48501457 −3.48501457 9.864317761 42 KRT18−5.393634178 −0.214336662 0.659378292 0.645722621 −3.361767124−6.292360598 0.141599862 43 KRT6A −2.750394666 −3.593741642 −3.978012535−1.007177282 1.775950277 −1.756754105 −1.199299426 44 KRT6B −2.809563532−3.679513879 −3.007677051 3.038039173 1.074967474 −0.771604139−2.767720312 45 KRT81 −0.708340478 −0.549017438 −1.474487037 1.7192786980.293233459 −2.832035518 1.29934714 46 KRT8 −6.585518291 −0.4453289580.292357177 0.583940663 −1.211875599 −7.214684995 0.11941998 47 LAD1−6.366889981 −3.487703983 −0.03314457 −0.937855879 −2.032988069−4.586601984 −0.415235244 48 LCK −0.077998068 0.355214635 −0.747932263−0.312039768 5.221543649 −1.853755715 −0.526016141 49 LGALS4−0.883963009 1.366073433 9.773901918 −1.137042557 0.240641913−1.598817352 −0.203473823 50 LYPD1 0.387124296 −0.528951531 0.547288234−0.876913408 0.13379984 0.661428164 −0.897772519 51 MARVELD3−5.995262907 −0.510050567 −0.675659361 0.322410532 −2.735366823−6.383666653 −0.153513769 52 MEG3 2.478280919 −2.166933932 −0.3208635980.160100014 2.853324939 −2.730675804 −3.166451005 53 MUC13 −0.9818540021.997077234 6.771194467 −1.618112428 −1.772933448 −2.419188288−1.948580943 54 MUC16 −1.427646768 −1.678538069 −2.940429889 1.60919−1.668765178 −2.940429889 1.33450999 55 MUC4 −3.199831606 −0.551376679−2.460727016 −2.202007274 −1.442945771 −4.022799483 −1.386650033 56 MYCN−1.115153774 −1.072643704 −0.572987836 −0.563222061 2.256763699−1.121103788 0.383429385 57 NAPSA −0.441224683 5.357444831 −0.330311961−0.680826272 1.54889432 −1.10948967 3.230552348 58 NKX3-1 0.34424304−0.695607881 −0.636037275 1.380598103 −0.358807355 −1.225980043−1.347772078 59 NPR1 3.611141372 2.936608674 0.739290647 −0.02018339−0.04911732 −1.692369146 0.560520251 60 PAX8 −0.438408776 5.138704009−0.439224734 −1.115572722 −0.742610195 −1.098989328 7.330352514 61 PRAME−2.28601297 3.648546027 −1.904681098 −1.204180981 3.6017196039.250663624 −3.172944285 62 PSCA −1.664873567 −2.372135296 −2.321377491.204841172 −1.580925941 0.391748243 −3.189626014 63 PVRL4 −4.983285402−6.308620694 −6.330798801 1.140190077 −3.872718208 −6.3592720340.126583059 64 S100P −4.27172475 −4.04897439 0.499095434 2.122747189−4.765549062 −4.663901764 −4.574712189 65 SALL4 −0.701854731−2.670674063 −1.068969101 0.308484856 −1.113633609 2.2196997730.019104844 66 SFTPD −2.015504097 −0.059979562 −1.883528068 −0.664564777−1.784683096 −1.174451747 2.77077955 67 SILV −0.404015329 0.4033523231.696442392 0.346443524 −1.809580739 12.3265592 −0.064184971 68 SIT10.560522874 0.56641542 −0.448541127 0.083905789 6.077538122 −1.010014644−0.588029822 69 SLC26A4 0.245073029 0.600404783 −1.143209401 0.0772511790.024544315 0.276126125 7.424523266 70 SLC3A1 −0.360884127 10.502655571.764194349 −0.750224836 −0.620498156 −0.303626634 −0.228977476 71SLC45A3 −1.119761006 0.021163292 0.733358084 −0.480014382 −2.00381626−1.453023157 −2.254242818 72 SOX17 0.817442439 0.837037779 0.3452605820.26054328 −0.203888143 −1.596069152 0.733884158 73 SPDEF 2.697057213−2.838870874 −2.854924213 7.609282727 −3.084262875 −2.745456716−2.779572185 74 SPINT2 −3.920290094 −0.542550342 −5.8416941830.163004523 −0.712680222 −5.966064445 0.738611694 75 TCEAL5 0.881390897−0.79947069 −2.066499307 0.638825329 0.319737925 −0.1147662420.788194285 76 TG 0.984073587 0.60666684 −1.221557274 −1.1110886431.124956459 −0.727984197 15.10003804 77 TMEM27 −1.803900196 6.8750823230.633259141 −0.517663186 −0.292695792 −0.470425257 1.189669219 78 TP63−0.665339268 −2.068568045 −1.9563835 1.974020464 5.386578658−2.022259539 −0.159043543 79 TRPS1 0.01584306 −0.129451899 −2.2320723474.425365059 −1.181551745 −1.947966145 −0.277029596 80 TSPAN8−1.889160765 −1.739590705 5.986779156 −2.397233332 −1.9296585−4.08731649 −2.677540765 81 UPK3B −0.660628051 −0.514196839 −0.477166533−0.755620892 −0.069584588 −1.061067781 −0.712380093 82 VTN 2.923525274−0.52452731 14.37513361 −0.702306293 −0.029929938 1.884788007−0.627880077 83 ZNF578 1.22614863 0.540308545 −1.87596215 −0.0831970770.429479376 −0.209295458 2.232739055 84 ZNF695 −0.449132017 −2.051634999−3.038221841 0.76346101 0.74872153 −0.970490477 −1.99162398

In one embodiment, a subset of one or more of the 84 genes of Table 1can be used to classify or determine the COCA subtype of a tumor sample.In one embodiment, all 84 genes of Table 1 can be used to classify ordetermine the COCA subtype of a tumor sample. In some embodiments, theup-regulation of a classifier biomarker (e.g. expression is increased)can refer to an expression value that is positive (i.e., higher thanzero) relative to a reference or control as provided herein. In someembodiments, the down-regulation of a classifier biomarker (e.g.expression is decreased) can refer to an expression value that isnegative (i.e., lower than zero) relative to a reference or control asprovided herein. In some embodiments, a classifier biomarker may have nospecific effects on a certain COCA subtype when the expression levelequals to zero.

In some embodiments, determining integrated, pan-cancer COCA subtypescan further include measuring the expression of at least one biomarkerfrom an additional set of biomarker classifiers. In one embodiment, anadditional set of biomarker classifiers can include measuring genesignatures related to cell proliferation. The gene signatures related tocell proliferation for use in the methods provided herein can includethe 11 gene signature comprising BIRC5, CCNB1, CDCl20, CDCA1, CEP55,KNTC2, MKI67, PTTG1, RRM2, TYMS, and UBE2C found in Martin M. et al.,Breast Cancer Res Treat, 138: 457-466 (2013), the 18 gene signaturefound in US 20160115551 and/or the 26 gene signature found in 62/789,668filed Jan. 8, 2019, each of which is herein incorporated by reference.In one embodiment, an additional set of biomarker classifiers caninclude a 5 gene signature comprising tumor driver genes such as TP53and RB1, and receptor tyrosine kinases including FGFR2, FGFR3, andERBB2. In one embodiment, the 5 gene signature is related to thesignature of tumor driver genes. In one embodiment, the biomarkerclassifiers can also include immune cell signatures that are known inthe art (Bindea G. et al., Immunity, 39(4): 782-95 (2013); Faruki H. etal., JTO, 12(6): 943-953 (2017); Charoentong P. et al., Cell reports,18, 248-262 (2017); Thorsson, V., Gibbs, D. L., Brown, S. D., Wolf, D.,Bortone, D. S., Yang, T. H. O., Porta-Pardo, E., Gao, G. F., Plaisier,C. L., Eddy, J. A. and Ziv, E., 2018, The immune landscape of cancer.Immunity, 48(4), pp. 812-830; and/or WO2017/201165, and WO2017/201164),each of which is herein incorporated by reference). In one embodiment,an additional set of biomarker classifiers can include assessing tumorpurity ABSOLUTE derived from the TCGA supplementary data. In oneembodiment, the additional set of biomarker can be gene signatures knownin the art for specific types of cancer. In one embodiment, the canceris lung cancer and the gene signature is selected from the genesignatures found in WO2017/201165, WO2017/201164, US20170114416 or U.S.Pat. No. 8,822,153, each of which is herein incorporated by reference intheir entirety. In one embodiment, the cancer is head and neck squamouscell carcinoma (HNSCC) and the gene signature is selected from the genesignatures found in PCT/US18/45522 or PCT/US18/48862, each of which isherein incorporated by reference in their entirety. In one embodiment,the cancer is breast cancer and the gene signature is the PAM50sub-typer found in Parker J S et al., (2009) Supervised risk predictorof breast cancer based on intrinsic subtypes. J Clin Oncol 27:1160-1167,which is herein incorporated by reference in its entirety. In oneembodiment, the cancer is bladder cancer and the gene signature caninclude the bladder cancer biomarker signature described in GeneExpression Omnibus (GEO) dataset: GSE87304, Seiler R. et al., Eur Urol,72(4):544-554 (2017); Gene Expression Omnibus (GEO) dataset: GSE32894,Sjodahl G. et al., Clin Cancer Res, 18(12):3377-86 (2012), each of whichis herein incorporated by reference). In one embodiment, the cancer isbladder cancer (e.g., MIBC) and the gene signature can include thebladder cancer biomarker signatures described in 62/629,975 filed Feb.13, 2018, which is herein incorporated by reference. In one embodiment,the cancer is bladder cancer (e.g., MIBC) and the gene signature caninclude the bladder cancer biomarker signature described in The CancerGenome Atlas Research Network. Comprehensive molecular characterizationof urothelial bladder carcinoma. Nature volume 507, pages 315-322(2014), or Robertson, A G, et al., Cell, 171(3): 540-556 (2017), each ofwhich is herein incorporated by reference.

In some embodiments, determining integrated, pan-cancer COCA subtypescan further include assessing tumor mutation burden (TMB) and/or TMBrate. In one embodiment, the TMB value and/or rate can be calculatedfrom RNA (e.g., via transcriptome profiling or RNA sequencing)) asprovided in U.S. 62/771,702 filed Nov. 27, 2018 and U.S. 62/743,257filed Oct. 9, 2018, which is herein incorporated by reference herein.

As provided herein, the expression levels of the at least one of theclassifier biomarkers (such as the classifier biomarkers of Table 1 orany additional set of biomarker classifiers as disclosed herein)determined, measured or detected from the sample obtained from thesubject can then be compared to reference expression levels of the atleast one of the classifier biomarkers (such as the classifierbiomarkers of Table 1 or any additional set of biomarker classifiers asdisclosed herein) from at least one sample training set. The at leastone sample training set can comprise, (i) expression levels of the atleast one biomarker from a sample that overexpresses the at least onebiomarker or (ii) expression levels from a reference sample for aspecific COCA subtype (e.g., C1 (ACC/PCPG), C2 (GBM/LGG), C3 (OV), C4(Squamous-like), C6 (LUAD-Enriched), C8 (PAAD/some STAD), C9 (UCS), C10(BRCA/Basal), C12 (UCEC), C14 (PRAD), C15 (CESC (subset of cervical)),C16 (BLCA), C17 (TGCT), C19 (COAD/READ), C20 (SARC/MESO), C21(KIRK/KICH/KIRP), C22 (Liver), C24 (BRCA/Luminal), C25 (THYM), C26(SKCM/UVM) and C28 (THCA)) and classifying the sample obtained from thesubject sample as a specific COCA subtype (e.g., C1 (ACC/PCPG), C2(GBM/LGG), C3 (OV), C4 (Squamous-like), C6 (LUAD-Enriched), C8(PAAD/some STAD), C9 (UCS), C10 (BRCA/Basal), C12 (UCEC), C14 (PRAD),C15 (CESC (subset of cervical)), C16 (BLCA), C17 (TGCT), C19(COAD/READ), C20 (SARC/MESO), C21 (KIRK/KICH/KIRP), C22 (Liver), C24(BRCA/Luminal), C25 (THYM), C26 (SKCM/UVM) and C28 (THCA)) based on theresults of the comparing step. In one embodiment, the comparing step cancomprise applying a statistical algorithm which comprises determining acorrelation between the expression data obtained from the sampleobtained from the subject and the expression data from the at least onetraining set(s); and classifying the sample obtained from the subject asa specific COCA subtype (e.g., C1 (ACC/PCPG), C2 (GBM/LGG), C3 (OV), C4(Squamous-like), C6 (LUAD-Enriched), C8 (PAAD/some STAD), C9 (UCS), C10(BRCA/Basal), C12 (UCEC), C14 (PRAD), C15 (CESC (subset of cervical)),C16 (BLCA), C17 (TGCT), C19 (COAD/READ), C20 (SARC/MESO), C21(KIRK/KICH/KIRP), C22 (Liver), C24 (BRCA/Luminal), C25 (THYM), C26(SKCM/UVM) and C28 (THCA)) based on the results of the statisticalalgorithm. The statistical algorithm can be any statistical algorithmfound in the art and/or provided herein.

In one embodiment, the statistical algorithm for the comparing step canbe an algorithm that comprises determining a correlation between theexpression data obtained from the tumor sample obtained from the subject(i.e., test sample) and centroids constructed from the expression levelsor profiles measured or detected for the at least one classifierbiomarkers (such as the classifier biomarkers of Table 1 or subsetsthereof or any additional set of biomarker classifiers or subsetsthereof as disclosed herein) from the at least one training set. TheCOCA subtype for the tumor sample (i.e., test sample) can then beassigned by finding the centroid to which it is nearest from thecentroids constructed from the expression data from the at least onetraining set, using any distance measure e.g. Euclidean distance orcorrelation. The centroids can be constructed using any method known inthe art for generating centroids such as, for example, those found inMullins et al. (2007) Clin Chem. 53(7):1273-9 or Dabney (2005)Bioinformatics 21(22):4148-4154 The COCA subtype can then be assigned tothe tumor sample obtained from subject based on the use of aclassification to the nearest centroid (CLaNC) algorithm as applied tothe expression data generated or measured from the tumor sample and thecentroid(s) constructed for the at least one training sets. The CLaNCalgorithm for use in the methods, compositions and kits provided hereincan be the CLaNC algorithm implemented by the CLaNC software found inDabney A R. ClaNC: Point-and-click software for classifying microarraysto nearest centroids. Bioinformatics. 2006; 22: 122-123 or equivalentsor derivatives thereof.

Sample Types/Methods of Detection

The methods and compositions provided herein allow for thedifferentiation or diagnosis of a sample obtained from a subject asbeing a specific COCA subtype. The COCA subtype can be one of 21integrated, pan-cancer COCA subtypes of cancer selected from C1(ACC/PCPG), C2 (GBM/LGG), C3 (OV), C4 (Squamous-like), C6(LUAD-Enriched), C8 (PAAD/some STAD), C9 (UCS), C10 (BRCA/Basal), C12(UCEC), C14 (PRAD), C15 (CESC (subset of cervical)), C16 (BLCA), C17(TGCT), C19 (COAD/READ), C20 (SARC/MESO), C21 (KIRK/KICH/KIRP), C22(Liver), C24 (BRCA/Luminal), C25 (THYM), C26 (SKCM/UVM) and C28 (THCA).The differentiation, detection or diagnosis of the sample obtained fromthe subject as being a COCA subtype as provided herein can beaccomplished by measuring or detecting the presence and/or level of oneor more classifier biomarkers from a publically available pan-cancerdataset and/or a pan-cancer dataset provided herein (e.g., Table 1). Themeasuring can be at the nucleic acid or protein level.

A sample for use in any of the methods and compositions provided hereincan be a tumor sample obtained from a subject or patient suffering fromor suspected of suffering from a type of cancer. The type of cancer canbe any type of cancer provided herein and/or known in the art. The tumorsample used for the detection or differentiation methods describedherein can be a sample previously determined or diagnosed as a type ofcancer sample using traditional tissue-of-origin methods. The previousdiagnosis can be based on a histological analysis. The histologicalanalysis can be performed by one or more pathologists.

The sample (e.g., tumor sample) can be any sample (e.g., tumor) isolatedfrom the subject or patient. In one embodiment, the subject or patientis a human subject or patient. For example, in one embodiment, theanalysis is performed on biopsies that are embedded in paraffin wax. Inone embodiment, the sample can be a fresh frozen tissue sample. Inanother embodiment, the sample can be a bodily fluid obtained from thepatient. The bodily fluid can be blood or fractions thereof (i.e.,serum, plasma), urine, saliva, sputum or cerebrospinal fluid (CSF). Thesample can contain cellular as well as extracellular sources of nucleicacid or protein for use in the methods provided herein. Theextracellular sources can be cell-free DNA and/or exosomes. In oneembodiment, the sample can be a cell pellet or a wash. This aspect ofthe methods provided herein provides a means to improve currentdiagnostics by accurately identifying the major histological types, evenfrom small biopsies. The methods provided herein, including the RT-PCRmethods, are sensitive, precise and have multi-analyte capability foruse with paraffin-embedded samples. See, for example, Cronin et al.(2004) Am. J Pathol. 164(1):35-42, herein incorporated by reference.

Formalin fixation and tissue embedding in paraffin wax is a universalapproach for tissue processing prior to light microscopic evaluation. Amajor advantage afforded by formalin-fixed paraffin-embedded (FFPE)specimens is the preservation of cellular and architectural morphologicdetail in tissue sections. (Fox et al. (1985) J Histochem Cytochem33:845-853). The standard buffered formalin fixative in which biopsyspecimens are processed is typically an aqueous solution containing 37%formaldehyde and 10-15% methyl alcohol. Formaldehyde is a highlyreactive dipolar compound that results in the formation ofprotein-nucleic acid and protein-protein crosslinks in vitro (Clark etal. (1986) J Histochem Cytochem 34:1509-1512; McGhee and von Hippel(1975) Biochemistry 14:1281-1296, each incorporated by referenceherein).

In one embodiment, the sample used herein is obtained from anindividual, and comprises formalin-fixed paraffin-embedded (FFPE)tissue. However, other tissue and sample types are amenable for useherein. In one embodiment, the other tissue and sample types can befresh frozen tissue, wash fluids, cell pellets, or the like. In oneembodiment, the sample can be a bodily fluid obtained from theindividual. The bodily fluid can be blood or fractions thereof (e.g.,serum, plasma), urine, sputum, saliva or cerebrospinal fluid (CSF). Abiomarker nucleic acid as provided herein can be extracted from a cell,can be cell free or extracted from an extracellular vesicular entitysuch as an exosome.

Methods are known in the art for the isolation of nucleic acid (e.g.,RNA) from FFPE tissue. In one embodiment, total RNA can be isolated fromFFPE tissues as described by Bibikova et al. (2004) American Journal ofPathology 165:1799-1807, herein incorporated by reference. Likewise, theHigh Pure RNA Paraffin Kit (Roche) can be used. Paraffin is removed byxylene extraction followed by ethanol wash. RNA can be isolated fromsectioned tissue blocks using the MasterPure Purification kit(Epicenter, Madison, Wis.); a DNase I treatment step is included. RNAcan be extracted from frozen samples using Trizol reagent according tothe supplier's instructions (Invitrogen Life Technologies, Carlsbad,Calif.). Samples with measurable residual genomic DNA can be resubjectedto DNasel treatment and assayed for DNA contamination. All purification,DNase treatment, and other steps can be performed according to themanufacturer's protocol. After total RNA isolation, samples can bestored at −80° C. until use.

General methods for mRNA extraction are well known in the art and aredisclosed in standard textbooks of molecular biology, including Ausubelet al., ed., Current Protocols in Molecular Biology, John Wiley & Sons,New York 1987-1999. Methods for RNA extraction from paraffin embeddedtissues are disclosed, for example, in Rupp and Locker (Lab Invest.56:A67, 1987) and De Andres et al. (Biotechniques 18:42-44, 1995). Inparticular, RNA isolation can be performed using a purification kit, abuffer set and protease from commercial manufacturers, such as Qiagen(Valencia, Calif.), according to the manufacturer's instructions. Forexample, total RNA from cells in culture can be isolated using QiagenRNeasy mini-columns. Other commercially available RNA isolation kitsinclude MasterPure™. Complete DNA and RNA Purification Kit (Epicentre,Madison, Wis.) and Paraffin Block RNA Isolation Kit (Ambion, Austin,Tex.). Total RNA from tissue samples can be isolated, for example, usingRNA Stat-60 (Tel-Test, Friendswood, Tex.). RNA prepared from a tumor canbe isolated, for example, by cesium chloride density gradientcentrifugation. Additionally, large numbers of tissue samples canreadily be processed using techniques well known to those of skill inthe art, such as, for example, the single-step RNA isolation process ofChomczynski (U.S. Pat. No. 4,843,155, incorporated by reference in itsentirety for all purposes).

In one embodiment, a sample comprises cells harvested from a tumorsample. Cells can be harvested from a biological sample using standardtechniques known in the art. For example, in one embodiment, cells areharvested by centrifuging a cell sample and resuspending the pelletedcells. The cells can be resuspended in a buffered solution such asphosphate-buffered saline (PBS). After centrifuging the cell suspensionto obtain a cell pellet, the cells can be lysed to extract nucleic acid,e.g, messenger RNA. All samples obtained from a subject, including thosesubjected to any sort of further processing, are considered to beobtained from the subject.

The sample, in one embodiment, is further processed before the detectionof the biomarker levels of the combination of biomarkers set forthherein. For example, mRNA in a cell or tissue sample can be separatedfrom other components of the sample. The sample can be concentratedand/or purified to isolate mRNA in its non-natural state, as the mRNA isnot in its natural environment. For example, studies have indicated thatthe higher order structure of mRNA in vivo differs from the in vitrostructure of the same sequence (see, e.g., Rouskin et al. (2014). Nature505, pp. 701-705, incorporated herein in its entirety for all purposes).

mRNA from the sample in one embodiment, is hybridized to a synthetic DNAprobe, which in some embodiments, includes a detection moiety (e.g.,detectable label, capture sequence, barcode reporting sequence).Accordingly, in these embodiments, a non-natural mRNA-cDNA complex isultimately made and used for detection of the biomarker. In anotherembodiment, mRNA from the sample is directly labeled with a detectablelabel, e.g., a fluorophore. In a further embodiment, the non-naturallabeled-mRNA molecule is hybridized to a cDNA probe and the complex isdetected.

In one embodiment, once the mRNA is obtained from a sample, it isconverted to complementary DNA (cDNA) prior to the hybridizationreaction or is used in a hybridization reaction together with one ormore cDNA probes. cDNA does not exist in vivo and therefore is anon-natural molecule. Furthermore, cDNA-mRNA hybrids are synthetic anddo not exist in vivo. Besides cDNA not existing in vivo, cDNA isnecessarily different than mRNA, as it includes deoxyribonucleic acidand not ribonucleic acid. The cDNA is then amplified, for example, bythe polymerase chain reaction (PCR) or other amplification method knownto those of ordinary skill in the art. For example, other amplificationmethods that may be employed include the ligase chain reaction (LCR) (Wuand Wallace, Genomics, 4:560 (1989), Landegren et al., Science, 241:1077(1988), incorporated by reference in its entirety for all purposes,transcription amplification (Kwoh et al., Proc. Natl. Acad. Sci. USA,86:1173 (1989), incorporated by reference in its entirety for allpurposes), self-sustained sequence replication (Guatelli et al., Proc.Nat. Acad. Sci. USA, 87:1874 (1990), incorporated by reference in itsentirety for all purposes), incorporated by reference in its entiretyfor all purposes, and nucleic acid based sequence amplification (NASBA).Guidelines for selecting primers for PCR amplification are known tothose of ordinary skill in the art. See, e.g., McPherson et al., PCRBasics: From Background to Bench, Springer-Verlag, 2000, incorporated byreference in its entirety for all purposes. The product of thisamplification reaction, i.e., amplified cDNA is also necessarily anon-natural product. First, as mentioned above, cDNA is a non-naturalmolecule. Second, in the case of PCR, the amplification process servesto create hundreds of millions of cDNA copies for every individual cDNAmolecule of starting material. The numbers of copies generated are farremoved from the number of copies of mRNA that are present in vivo.

In one embodiment, cDNA is amplified with primers that introduce anadditional DNA sequence (e.g., adapter, reporter, capture sequence ormoiety, barcode) onto the fragments (e.g., with the use ofadapter-specific primers), or mRNA or cDNA biomarker sequences arehybridized directly to a cDNA probe comprising the additional sequence(e.g., adapter, reporter, capture sequence or moiety, barcode).Amplification and/or hybridization of mRNA to a cDNA probe thereforeserves to create non-natural double stranded molecules from thenon-natural single stranded cDNA, or the mRNA, by introducing additionalsequences and forming non-natural hybrids. Further, as known to those ofordinary skill in the art, amplification procedures have error ratesassociated with them. Therefore, amplification introduces furthermodifications into the cDNA molecules. In one embodiment, duringamplification with the adapter-specific primers, a detectable label,e.g., a fluorophore, is added to single strand cDNA molecules.Amplification therefore also serves to create DNA complexes that do notoccur in nature, at least because (i) cDNA does not exist in vivo, (i)adapter sequences are added to the ends of cDNA molecules to make DNAsequences that do not exist in vivo, (ii) the error rate associated withamplification further creates DNA sequences that do not exist in vivo,(iii) the disparate structure of the cDNA molecules as compared to whatexists in nature, and (iv) the chemical addition of a detectable labelto the cDNA molecules.

In some embodiments, the expression of a biomarker of interest isdetected at the nucleic acid level via detection of non-natural cDNAmolecules.

The biomarkers described herein can include RNA comprising the entire orpartial sequence of any of the nucleic acid sequences of interest, ortheir non-natural cDNA product, obtained synthetically in vitro in areverse transcription reaction. The term “fragment” is intended to referto a portion of the polynucleotide that generally comprise at least 10,15, 20, 50, 75, 100, 150, 200, 250, 300, 350, 400, 450, 500, 550, 600,650, 700, 800, 900, 1,000, 1,200, or 1,500 contiguous nucleotides, or upto the number of nucleotides present in a full-length biomarkerpolynucleotide disclosed herein. A fragment of a biomarkerpolynucleotide will generally encode at least 15, 25, 30, 50, 100, 150,200, or 250 contiguous amino acids, or up to the total number of aminoacids present in a full-length biomarker protein as provided herein.

Isolated mRNA can be used in hybridization or amplification assays thatinclude, but are not limited to, Southern or Northern analyses, PCRanalyses and probe arrays, NanoString Assays. One method for thedetection of mRNA levels involves contacting the isolated mRNA orsynthesized cDNA with a nucleic acid molecule (probe) that can hybridizeto the mRNA encoded by the gene being detected. The nucleic acid probecan be, for example, a cDNA, or a portion thereof, such as anoligonucleotide of at least 7, 15, 30, 50, 100, 250, or 500 nucleotidesin length and sufficient to specifically hybridize under stringentconditions to the non-natural cDNA or mRNA biomarker provided herein.

In one embodiment, the measuring or detecting step in any methodprovided herein is at the nucleic acid level by performing RNA-seq, areverse transcriptase polymerase chain reaction (RT-PCR) or ahybridization assay with oligonucleotides that are substantiallycomplementary to portions of cDNA molecules of the at least oneclassifier biomarker (such as the classifier biomarkers of Table 1 orany additional set of biomarker classifiers as disclosed herein) underconditions suitable for RNA-seq, RT-PCR or hybridization and obtainingexpression levels of the at least one classifier biomarkers based on thedetecting step.

In some embodiments, the method for COCA subtyping includes not onlydetecting expression levels of a classifier biomarker set in a sampleobtained from a subject, but can further comprise detecting expressionlevels of said classifier biomarker set in one or more control orreference samples. The one or more control or reference samples can beselected from a normal or cancer-free sample, a cancer sample of aspecific COCA subtype (e.g., C1 (ACC/PCPG), C2 (GBM/LGG), C3 (OV), C4(Squamous-like), C6 (LUAD-Enriched), C8 (PAAD/some STAD), C9 (UCS), C10(BRCA/Basal), C12 (UCEC), C14 (PRAD), C15 (CESC (subset of cervical)),C16 (BLCA), C17 (TGCT), C19 (COAD/READ), C20 (SARC/MESO), C21(KIRK/KICH/KIRP), C22 (Liver), C24 (BRCA/Luminal), C25 (THYM), C26(SKCM/UVM) and C28 (THCA)) or any combination thereof. In someembodiments, the detecting includes all of the classifier biomarkers ofTable 1 or any additional set of biomarker classifiers as disclosedherein at the nucleic acid level or protein level. In some embodiments,the detecting includes all of the classifier biomarkers of Table 1 atthe nucleic acid level or protein level. In another embodiment, a singleor a subset or a plurality of the classifier biomarkers of Table 1 aredetected, for example, from about 1 to about 4, from about 4 to about 8,from about 8 to about 12, from about 12 to about 16, from about 16 toabout 20, from about 20 to about 24, from about 24 to about 28, fromabout 28 to about 32, from about 32 to about 36, from about 36 to about40, from about 40 to about 44, from about 44 to about 48, from about 48to about 52, from about 52 to about 56, from about 56 to about 60, fromabout 60 to about 64, from about 64 to about 68, from about 68 to about72, from about 72 to about 76, from about 76 to about 80 of thebiomarkers in Table 1 are detected in a method to determine the COCAsubtype. In another embodiment, each of the biomarkers from Table 1 isdetected in a method to determine the COCA subtype. In anotherembodiment, any of 84 of the biomarkers from Table 1 are selected as thegene signatures for a specific COCA subtype. The detecting can beperformed by any suitable technique including, but not limited to,RNA-seq, a reverse transcriptase polymerase chain reaction (RT-PCR), amicroarray hybridization assay, or another hybridization assay, e.g., aNanoString assay for example, with primers and/or probes specific to theclassifier biomarkers, and/or the like. In some cases, the primersuseful for the amplification methods (e.g., RT-PCR or qRT-PCR) are anyforward and reverse primers suitable for binding to a classifierbiomarker provided herein, such as the classifier biomarkers of Table 1or any additional set of biomarker classifiers as disclosed herein.

As explained above, in one embodiment, once the mRNA is obtained from asample (e.g., form a subject suffering from or suspected of sufferingfrom cancer or a control subject), it is converted to complementary DNA(cDNA) in a hybridization reaction. Conversion of the mRNA to cDNA canbe performed with oligonucleotides or primers comprising sequence thatis complementary to a portion of a specific mRNA. Conversion of the mRNAto cDNA can be performed with oligonucleotides or primers comprisingrandom sequence. Conversion of the mRNA to cDNA can be performed witholigonucleotides or primers comprising sequence that is complementary tothe poly(A) tail of an mRNA. cDNA does not exist in vivo and thereforeis a non-natural molecule. In a further embodiment, the cDNA is thenamplified, for example, by the polymerase chain reaction (PCR) or otheramplification method known to those of ordinary skill in the art. PCRcan be performed with the forward and/or reverse primers comprisingsequence complementary to at least a portion of a classifier biomarkerprovided herein, such as the classifier biomarkers of Table 1 or anyadditional set of biomarker classifiers as disclosed herein. The productof this amplification reaction, i.e., amplified cDNA is necessarily anon-natural product. As mentioned above, cDNA is a non-natural molecule.Second, in the case of PCR, the amplification process serves to createhundreds of millions of cDNA copies for every individual cDNA moleculeof starting material. The number of copies generated is far removed fromthe number of copies of mRNA that are present in vivo.

In one embodiment, cDNA is amplified with primers that introduce anadditional DNA sequence (adapter sequence) onto the fragments (with theuse of adapter-specific primers). The adaptor sequence can be a tail,wherein the tail sequence is not complementary to the cDNA. For example,the forward and/or reverse primers comprising sequence complementary toat least a portion of a classifier biomarker provided herein, such asthe classifier biomarkers of Table 1 or any additional set of biomarkerclassifiers as disclosed herein can comprise tail sequence.Amplification therefore serves to create non-natural double strandedmolecules from the non-natural single stranded cDNA, by introducingbarcode, adapter and/or reporter sequences onto the already non-naturalcDNA. In one embodiment, during amplification with the adapter-specificprimers, a detectable label, e.g., a fluorophore, is added to singlestrand cDNA molecules. Amplification therefore also serves to create DNAcomplexes that do not occur in nature, at least because (i) cDNA doesnot exist in vivo, (ii) adapter sequences are added to the ends of cDNAmolecules to make DNA sequences that do not exist in vivo, (iii) theerror rate associated with amplification further creates DNA sequencesthat do not exist in vivo, (iv) the disparate structure of the cDNAmolecules as compared to what exists in nature, and (v) the chemicaladdition of a detectable label to the cDNA molecules.

In one embodiment, the synthesized cDNA (for example, amplified cDNA) isimmobilized on a solid surface via hybridization with a probe, e.g., viaa microarray. In another embodiment, cDNA products are detected viareal-time polymerase chain reaction (PCR) via the introduction offluorescent probes that hybridize with the cDNA products. For example,in one embodiment, biomarker detection is assessed by quantitativefluorogenic RT-PCR (e.g., with TaqMan® probes). For PCR analysis, wellknown methods are available in the art for the determination of primersequences for use in the analysis.

In one embodiment, the measuring or detecting step in any methodprovided herein is performed via a hybridization assay that comprisesprobing the levels of at least one of the classifier biomarkers providedherein, such as the classifier biomarkers of Table 1 or any additionalset of biomarker classifiers disclosed herein, at the nucleic acidlevel, in a tumor sample obtained from the patient. The probing step, inone embodiment, comprises mixing the sample with one or moreoligonucleotides that are substantially complementary to portions ofcDNA molecules of the at least one classifier biomarkers providedherein, such as the classifier biomarkers of Table 1 or any additionalset of biomarker classifiers disclosed herein under conditions suitablefor hybridization of the one or more oligonucleotides to theircomplements or substantial complements; detecting whether hybridizationoccurs between the one or more oligonucleotides to their complements orsubstantial complements; and obtaining hybridization values of the atleast one classifier biomarkers based on the detecting step. Thehybridization values of the at least one classifier biomarkers are thencompared to reference hybridization value(s) from at least one sampletraining set. The tumor sample is classified, for example, as a COCAsubtype (e.g., C1 (ACC/PCPG), C2 (GBM/LGG), C3 (OV), C4 (Squamous-like),C6 (LUAD-Enriched), C8 (PAAD/some STAD), C9 (UCS), C10 (BRCA/Basal), C12(UCEC), C14 (PRAD), C15 (CESC (subset of cervical)), C16 (BLCA), C17(TGCT), C19 (COAD/READ), C20 (SARC/MESO), C21 (KIRK/KICH/KIRP), C22(Liver), C24 (BRCA/Luminal), C25 (THYM), C26 (SKCM/UVM) and C28 (THCA))based on the results of the comparing step. In one embodiment, thehybridization values of the tumor sample can be compared to centroid(s)constructed from the hybridization values of the training set.

In one embodiment, the hybridization reaction utilized in methodsprovided herein employs a capture probe and/or a reporter probe. Forexample, the hybridization probe is a probe derivatized to a solidsurface such as a bead, glass or silicon substrate. In anotherembodiment, the capture probe is present in solution and mixed with thepatient's sample, followed by attachment of the hybridization product toa surface, e.g., via a biotin-avidin interaction (e.g., where biotin isa part of the capture probe and avidin is on the surface). Thehybridization assay, in one embodiment, employs both a capture probe anda reporter probe. The reporter probe can hybridize to either the captureprobe or the biomarker nucleic acid. Reporter probes e.g., are thencounted and detected to determine the level of biomarker(s) in thesample. The capture and/or reporter probe, in one embodiment contain adetectable label, and/or a group that allows functionalization to asurface.

For example, the nCounter gene analysis system (see, e.g., Geiss et al.(2008) Nat. Biotechnol. 26, pp. 317-325, incorporated by reference inits entirety for all purposes, is amenable for use with the methodsprovided herein.

Hybridization assays described in U.S. Pat. Nos. 7,473,767 and8,492,094, the disclosures of which are incorporated by reference intheir entireties for all purposes, are amenable for use with the methodsprovided herein, i.e., to detect the biomarkers and biomarkercombinations described herein.

Biomarker levels may be monitored using a membrane blot (such as used inhybridization analysis such as Northern, Southern, dot, and the like),or microwells, sample tubes, gels, beads, or fibers (or any solidsupport comprising bound nucleic acids). See, for example, U.S. Pat.Nos. 5,770,722, 5,874,219, 5,744,305, 5,677,195 and 5,445,934, eachincorporated by reference in their entireties.

In one embodiment, microarrays are used to detect biomarker levels.Microarrays are particularly well suited for this purpose because of thereproducibility between different experiments. DNA microarrays provideone method for the simultaneous measurement of the expression levels oflarge numbers of genes. Each array consists of a reproducible pattern ofcapture probes attached to a solid support. Labeled RNA or DNA ishybridized to complementary probes on the array and then detected bylaser scanning hybridization intensities for each probe on the array aredetermined and converted to a quantitative value representing relativegene expression levels. See, for example, U.S. Pat. Nos. 6,040,138,5,800,992 and 6,020,135, 6,033,860, and 6,344,316, each incorporated byreference in their entireties. High-density oligonucleotide arrays areparticularly useful for determining the gene expression profile for alarge number of RNAs in a sample.

Techniques for the synthesis of these arrays using mechanical synthesismethods are described in, for example, U.S. Pat. No. 5,384,261. Althougha planar array surface is generally used, the array can be fabricated ona surface of virtually any shape or even a multiplicity of surfaces.Arrays can be nucleic acids (or peptides) on beads, gels, polymericsurfaces, fibers (such as fiber optics), glass, or any other appropriatesubstrate. See, for example, U.S. Pat. Nos. 5,770,358, 5,789,162,5,708,153, 6,040,193 and 5,800,992, each incorporated by reference intheir entireties. Arrays can be packaged in such a manner as to allowfor diagnostics or other manipulation of an all-inclusive device. See,for example, U.S. Pat. Nos. 5,856,174 and 5,922,591, each incorporatedby reference in their entireties.

Serial analysis of gene expression (SAGE) in one embodiment is employedin the methods described herein. SAGE is a method that allows thesimultaneous and quantitative analysis of a large number of genetranscripts, without the need of providing an individual hybridizationprobe for each transcript. First, a short sequence tag (about 10-14 bp)is generated that contains sufficient information to uniquely identify atranscript, provided that the tag is obtained from a unique positionwithin each transcript. Then, many transcripts are linked together toform long serial molecules, that can be sequenced, revealing theidentity of the multiple tags simultaneously. The expression pattern ofany population of transcripts can be quantitatively evaluated bydetermining the abundance of individual tags, and identifying the genecorresponding to each tag. See, Velculescu et al. Science 270:484-87,1995; Cell 88:243-51, 1997, incorporated by reference in its entirety.

In another embodiment, the measuring or detecting step in any methodprovided herein is performed via an amplification assay. Theamplification assay can be coupled with a sequencing method. In oneembodiment, a method of biomarker level analysis at the nucleic acidlevel as provided herein utilizes an amplification reaction coupled witha sequencing method such as, for example, RNAseq, next generationsequencing, and massively parallel signature sequencing (MPSS) asdescribed by Brenner et al. (Nat. Biotech. 18:630-34, 2000, incorporatedby reference in its entirety). MPSS is a sequencing approach thatcombines non-gel-based signature sequencing with in vitro cloning ofmillions of templates on separate 5 μm diameter microbeads. First, amicrobead library of DNA templates is constructed by in vitro cloning.This is followed by the assembly of a planar array of thetemplate-containing microbeads in a flow cell at a high density(typically greater than 3.0×10⁶ microbeads/cm²). The free ends of thecloned templates on each microbead are analyzed simultaneously, using afluorescence-based signature sequencing method that does not require DNAfragment separation. This method has been shown to simultaneously andaccurately provide, in a single operation, hundreds of thousands of genesignature sequences from a yeast cDNA library.

The expression level values of the at least one classifier biomarkersobtained from the amplification and/or sequencing assay are thencompared to reference expression level value(s) from at least one sampletraining set. The tumor sample is classified, for example, as a COCAsubtype (e.g., C1 (ACC/PCPG), C2 (GBM/LGG), C3 (OV), C4 (Squamous-like),C6 (LUAD-Enriched), C8 (PAAD/some STAD), C9 (UCS), C10 (BRCA/Basal), C12(UCEC), C14 (PRAD), C15 (CESC (subset of cervical)), C16 (BLCA), C17(TGCT), C19 (COAD/READ), C20 (SARC/MESO), C21 (KIRK/KICH/KIRP), C22(Liver), C24 (BRCA/Luminal), C25 (THYM), C26 (SKCM/UVM) and C28 (THCA))based on the results of the comparing step. In one embodiment, theexpression level values of the tumor sample can be compared tocentroid(s) constructed from the expression level values obtained fromthe training set.

Another method of biomarker level analysis at the nucleic acid level foruse in any method provided herein is the use of an amplification methodsuch as, for example, RT-PCR or quantitative RT-PCR (qRT-PCR). Methodsfor determining the level of biomarker mRNA in a sample may involve theprocess of nucleic acid amplification, e.g., by RT-PCR (the experimentalembodiment set forth in Mullis, 1987, U.S. Pat. No. 4,683,202), ligasechain reaction (Barany (1991) Proc. Natl. Acad. Sci. USA 88:189-193),self-sustained sequence replication (Guatelli et al. (1990) Proc. Natl.Acad. Sci. USA 87:1874-1878), transcriptional amplification system (Kwohet al. (1989) Proc. Natl. Acad. Sci. USA 86:1173-1177), Q-Beta Replicase(Lizardi et al. (1988) Bio/Technology 6:1197), rolling circlereplication (Lizardi et al., U.S. Pat. No. 5,854,033) or any othernucleic acid amplification method, followed by the detection of theamplified molecules using techniques well known to those of skill in theart. Numerous different PCR or qRT-PCR protocols are known in the artand can be directly applied or adapted for use using the presentlydescribed compositions for the detection and/or quantification ofexpression of discriminative genes in a sample. See, for example, Fan etal. (2004) Genome Res. 14:878-885, herein incorporated by reference.Generally, in PCR, a target polynucleotide sequence is amplified byreaction with at least one oligonucleotide primer or pair ofoligonucleotide primers. The primer(s) hybridize to a complementaryregion of the target nucleic acid and a DNA polymerase extends theprimer(s) to amplify the target sequence. Under conditions sufficient toprovide polymerase-based nucleic acid amplification products, a nucleicacid fragment of one size dominates the reaction products (the targetpolynucleotide sequence which is the amplification product). Theamplification cycle is repeated to increase the concentration of thesingle target polynucleotide sequence. The reaction can be performed inany thermocycler commonly used for PCR.

Quantitative RT-PCR (qRT-PCR) (also referred as real-time RT-PCR) ispreferred under some circumstances because it provides not only aquantitative measurement, but also reduced time and contamination. Asused herein, “quantitative PCR” (or “real time qRT-PCR”) refers to thedirect monitoring of the progress of a PCR amplification as it isoccurring without the need for repeated sampling of the reactionproducts. In quantitative PCR, the reaction products may be monitoredvia a signaling mechanism (e.g., fluorescence) as they are generated andare tracked after the signal rises above a background level but beforethe reaction reaches a plateau. The number of cycles required to achievea detectable or “threshold” level of fluorescence varies directly withthe concentration of amplifiable targets at the beginning of the PCRprocess, enabling a measure of signal intensity to provide a measure ofthe amount of target nucleic acid in a sample in real time. A DNAbinding dye (e.g., SYBR green) or a labeled probe can be used to detectthe extension product generated by PCR amplification. Any probe formatutilizing a labeled probe comprising the sequences provided herein maybe used.

Immunohistochemistry methods are also suitable for detecting the levelsof the biomarkers provided herein. Samples can be frozen for laterpreparation or immediately placed in a fixative solution. Tissue samplescan be fixed by treatment with a reagent, such as formalin,glutaraldehyde, methanol, or the like and embedded in paraffin. Methodsfor preparing slides for immunohistochemical analysis fromformalin-fixed, paraffin-embedded tissue samples are well known in theart.

In one embodiment, COCA subtypes can be evaluated using levels ofprotein expression of one or more of the classifier biomarkers providedherein, such as the classifier biomarkers of Table 1 or any additionalset of biomarker classifiers disclosed herein. The level of proteinexpression can be measured using an immunological detection method.Immunological detection methods which can be used herein include, butare not limited to, competitive and non-competitive assay systems usingtechniques such as Western blots, radioimmunoassays, ELISA (enzymelinked immunosorbent assay), “sandwich” immunoassays,immunoprecipitation assays, precipitin reactions, gel diffusionprecipitin reactions, immunodiffusion assays, agglutination assays,complement-fixation assays, immunoradiometric assays, fluorescentimmunoassays, protein A immunoassays, and the like. Such assays areroutine and well known in the art (see, e.g., Ausubel et al, eds, 1994,Current Protocols in Molecular Biology, Vol. I, John Wiley & Sons, Inc.,New York, which is incorporated by reference herein in its entirety).

In one embodiment, antibodies specific for biomarker proteins areutilized to detect the expression of a biomarker protein in a sample(e.g., tumor sample). The method comprises obtaining a sample from apatient or a subject, contacting the sample with at least one antibodydirected to a biomarker that is selectively expressed in cancer cells,and detecting antibody binding to determine if the biomarker isexpressed in the patient sample. Also provided herein is animmunocytochemistry technique for diagnosing COCA subtypes. One of skillin the art will recognize that the immunocytochemistry method describedherein below may be performed manually or in an automated fashion.

In some embodiments, the expression level of a classifier biomarker(s)(e.g., from Table 1) as determined using any methods or compositionsprovided herein or its expression product, is determined bynormalization to the level of reference nucleic acid(s) (e.g., RNAtranscripts) or their expression products (e.g., proteins), which can beall measured nucleic acids (e.g., transcripts (or their products)) inthe sample or a particular reference set of nucleic acids (e.g., RNAtranscripts (or their non-natural cDNA products)). Normalization isperformed to correct for or normalize away both differences in theamount of nucleic acid (e.g., RNA or cDNA) assayed and variability inthe quality of the nucleic acid (e.g., RNA or cDNA) used. Therefore, anassay typically measures and incorporates the expression of certainnormalizing genes, including well known housekeeping genes, such as, forexample, GAPDH and/or β-Actin. Alternatively, normalization can be basedon the mean or median signal of all of the assayed biomarkers or a largesubset thereof (global normalization approach).

In one embodiment, the levels of the biomarkers provided herein, such asthe classifier biomarkers of Table 1 (or subsets thereof, for example, 1to 4, 4 to 8, 8 to 12, 12 to 16, 16 to 20, 20 to 24, 24 to 28, 28 to 32,32 to 36, 36 to 40, 40 to 44, 44 to 48, 48 to 52, 52 to 56, 56 to 60, 60to 64, 64 to 68, 68 to 72, 72 to 76, 76 to 80, 80 to 84 of theclassifier biomarkers) are normalized against the expression levels ofall RNA transcripts or their non-natural cDNA expression products, orprotein products in the sample, or of a reference set of RNA transcriptsor a reference set of their non-natural cDNA expression products, or areference set of their protein products in the sample. In oneembodiment, the levels of the biomarkers provided herein, such as any ofthe additional set of classifier biomarkers disclosed herein arenormalized against the expression levels of all RNA transcripts or theirnon-natural cDNA expression products, or protein products in the sample,or of a reference set of RNA transcripts or a reference set of theirnon-natural cDNA expression products, or a reference set of theirprotein products in the sample.

Statistical Methods

As provided throughout, the methods set forth herein provide a methodfor determining the COCA subtype of a patient. Once the biomarker levels(e.g., Table 1 or any other gene signature provided herein) aredetermined, for example by measuring non-natural cDNA biomarker levelsor non-natural mRNA-cDNA biomarker complexes, the biomarker levels arecompared to reference values or a reference sample as provided herein,for example with the use of statistical methods or direct comparison ofdetected levels, to make a determination of the COCA subtype. Based onthe comparison, the patient's tumor sample is classified, e.g., as aspecific COCA subtype (e.g., C1 ACC/PCPG, C2 GBM/LGG, C3 OV, C4Squamous-like, C6 LUAD-Enriched, C8 PAAD/some STAD, C9 UCS, C10BRCA/Basal, C12 UCEC, C14 PRAD, C15 CESC (subset of cervical), C16 BLCA,C17 TGCT, C19 COAD/READ, C20 SARC/MESO, C21 KIRK/KICH/KIRP, C22 Liver,C24 BRCA/Luminal, C25 THYM, C26 SKCM/UVM and C28 THCA).

In one embodiment, expression level values of the at least oneclassifier biomarkers provided herein, such as the classifier biomarkersof Table 1 are compared to reference expression level value(s) from atleast one sample training set, wherein the at least one sample trainingset comprises expression level values from a reference sample(s). In afurther embodiment, the at least one sample training set comprisesexpression level values of the at least one classifier biomarkersprovided herein, such as the classifier biomarkers of Table 1 or anyadditional set of biomarker classifiers disclosed herein from a specificCOCA subtype (e.g., C1 ACC/PCPG, C2 GBM/LGG, C3 OV, C4 Squamous-like, C6LUAD-Enriched, C8 PAAD/some STAD, C9 UCS, C10 BRCA/Basal, C12 UCEC, C14PRAD, C15 CESC (subset of cervical), C16 BLCA, C17 TGCT, C19 COAD/READ,C20 SARC/MESO, C21 KIRK/KICH/KIRP, C22 Liver, C24 BRCA/Luminal, C25THYM, C26 SKCM/UVM and C28 THCA) or a combination thereof.

In a separate embodiment, for methods provided herein employing ahybridization assay, hybridization values of the at least one classifierbiomarkers provided herein, such as the classifier biomarkers of Table 1or any additional set of biomarker classifiers disclosed herein arecompared to reference hybridization value(s) from at least one sampletraining set, wherein the at least one sample training set compriseshybridization values from a reference sample(s). In a furtherembodiment, the at least one sample training set comprises hybridizationvalues of the at least one classifier biomarker provided herein, such asthe classifier biomarkers of Table 1 or any additional set of biomarkerclassifiers disclosed herein from a specific COCA subtype (e.g., C1ACC/PCPG, C2 GBM/LGG, C3 OV, C4 Squamous-like, C6 LUAD-Enriched, C8PAAD/some STAD, C9 UCS, C10 BRCA/Basal, C12 UCEC, C14 PRAD, C15 CESC(subset of cervical), C16 BLCA, C17 TGCT, C19 COAD/READ, C20 SARC/MESO,C21 KIRK/KICH/KIRP, C22 Liver, C24 BRCA/Luminal, C25 THYM, C26 SKCM/UVMand C28 THCA) or a combination thereof. Methods for comparing detectedlevels of biomarkers to reference values and/or reference samples areprovided herein. Based on this comparison, in one embodiment acorrelation between the biomarker levels obtained from the subject'ssample and the reference values is obtained. An assessment of the COCAsubtype is then made.

Various statistical methods can be used to aid in the comparison of thebiomarker levels obtained from the patient and reference biomarkerlevels, for example, from at least one sample training set.

In one embodiment, a supervised pattern recognition method is employed.Examples of supervised pattern recognition methods can include, but arenot limited to, the nearest centroid methods (Dabney (2005)Bioinformatics 21(22):4148-4154 and Tibshirani et al. (2002) Proc. Natl.Acad. Sci. USA 99(10):6576-6572); soft independent modeling of classanalysis (SIMCA) (see, for example, Wold, 1976); partial least squaresanalysis (PLS) (see, for example, Wold, 1966; Joreskog, 1982; Frank,1984; Bro, R., 1997); linear discriminant analysis (LDA) (see, forexample, Nillson, 1965); K-nearest neighbor analysis (KNN) (sec, forexample, Brown et al., 1996); artificial neural networks (ANN) (see, forexample, Wasserman, 1989; Anker et al., 1992; Hare, 1994); probabilisticneural networks (PNNs) (see, for example, Parzen, 1962; Bishop, 1995;Speckt, 1990; Broomhead et al., 1988; Patterson, 1996); rule induction(RI) (see, for example, Quinlan, 1986); and, Bayesian methods (see, forexample, Bretthorst, 1990a, 1990b, 1988). In one embodiment, theclassifier for identifying COCA subtypes based on gene expression datais used in a centroid based method as described in Mullins et al. (2007)Clin Chem. 53(7):1273-9, which is incorporated herein by reference inits entirety. In another embodiment, the classifier for identifyingtumor subtypes based on gene expression data is used in a nearestcentroid based method as described in Dabney (2005) Bioinformatics21(22):4148-4154, which is incorporated herein by reference in itsentirety. The nearest centroid based method can be performed using CLaNCsoftware as described in Dabney A R. ClaNC: Point-and-click software forclassifying microarrays to nearest centroids. Bioinformatics. 2006; 22:122-123 or equivalents or derivatives thereof.

In other embodiments, an unsupervised training approach is employed, andtherefore, no training set is used.

Referring to sample training sets for supervised learning approachesagain, in some embodiments, a sample training set(s) can includeexpression data of a plurality or all of the classifier biomarkers(e.g., all the classifier biomarkers of Table 1) from a specific COCAsubtype sample. The plurality of classifier biomarkers can comprise atleast 4 classifier biomarkers, at least 8 classifier biomarkers, atleast 12 classifier biomarkers, at least 16 classifier biomarkers atleast 20 classifier biomarkers, at least 24 classifier biomarkers, atleast 28 classifier biomarkers, at least 32 classifier biomarkers, atleast 36 classifier biomarkers, at least 40 classifier biomarkers, atleast 44 classifier biomarkers, at least 48 classifier biomarkers, atleast 52 classifier biomarkers, at least 56 classifier biomarkers, atleast 60 classifier biomarkers, at least 64 classifier biomarkers, atleast 68 classifier biomarkers, at least 72 classifier biomarkers, atleast 76 classifier biomarkers or at least 80 classifier biomarkers ofTable 1. In some embodiments, the plurality of classifier biomarkerscomprises all 84 biomarkers of Table 1. In some embodiments, the sampletraining set(s) are normalized to remove sample-to-sample variation.

In some embodiments, comparing can include applying a statisticalalgorithm, such as, for example, any suitable multivariate statisticalanalysis model, which can be parametric or non-parametric. In someembodiments, applying the statistical algorithm can include determininga correlation between the expression data obtained from the tumor sampleobtained from the subject suffering from or suspected of suffering fromcancer (i.e., the test subject) and the expression data from the COCAsubtyping training set(s). In some embodiments, cross-validation isperformed, such as (for example), leave-one-out cross-validation(LOOCV). In some embodiments, integrative correlation is performed. Insome embodiments, a Spearman correlation is performed. In someembodiments, a centroid based method based on gene expression data isemployed for the statistical algorithm. The centroids can be constructedusing any method known in the art for generating centroids such as, forexample, those found in Mullins et al. (2007) Clin Chem. 53(7):1273-9 orthe nearest centroid method found in Dabney (2005) Bioinformatics21(22):4148-4154, which is herein incorporated by reference in itsentirety. In one embodiment, a correlation analysis is performed on theexpression data obtained from the tumor sample and the centroid(s)constructed from the expression data from the COCA training set(s). Thecorrelation analysis can be a Spearman correlation or a Pearsoncorrelation. In one embodiment, a distance measure analysis (e.g.,Euclidean distance) is performed on the expression data obtained fromthe tumor sample and the centroid(s) constructed on the expression datafrom the COCA training set(s).

Results of the gene expression analysis performed on a sample from asubject (test sample) may be compared to a biological sample(s) or dataderived from a biological sample(s) (e.g., expression data or levelsfrom at least one classifier biomarker provided herein, e.g., Table 1)that is known or suspected to be normal (“reference sample” or “normalsample”, e.g., non-cancer sample). In some embodiments, a referencesample or reference gene expression data (e.g., expression data orlevels from at least one classifier biomarker provided herein, e.g.,Table 1) is obtained or derived from an individual known to have aparticular COCA subtype of cancer, e.g., C1 ACC/PCPG, C2 GBM/LGG, C3 OV,C4 Squamous-like, C6 LUAD-Enriched, C8 PAAD/some STAD, C9 UCS, C10BRCA/Basal, C12 UCEC, C14 PRAD, C15 CESC (subset of cervical), C16 BLCA,C17 TGCT, C19 COAD/READ, C20 SARC/MESO, C21 KIRK/KICH/KIRP, C22 Liver,C24 BRCA/Luminal, C25 THYM, C26 SKCM/UVM and C28 THCA. In oneembodiment, the gene expression levels or profile measured for the atleast one classifier biomarkers from Table 1 measured or detected in thetest sample (i.e., tumor sample obtained from the subject) may becompared to centroids constructed from the gene expression performed onthe reference or normal sample or training set and classification can bebased on determining which is the nearest centroid based on distancemeasure such as, for example, a Euclidean distance or a correlation. Thecentroids can be constructed using any of the methods provided hereinsuch as, for example, using the ClaNC software described in Dabney A R.ClaNC: Point-and-click software for classifying microarrays to nearestcentroids. Bioinformatics. 2006; 22: 122-123 or equivalents orderivatives related thereto. Classification or determination of thesubtype of the test sample can then be ascertained by determining thenearest centroid from the reference or normal sample to which theexpression levels or profile from said test sample is nearest based on adistance measure or correlation. The distance measure can be a Euclideandistance.

The reference sample may be assayed at the same time, or at a differenttime from the sample obtained from the test subject (i.e., test sample).Alternatively, the biomarker level information from a reference samplemay be stored in a database or other means for access at a later date.

The biomarker level results of an assay on the test sample may becompared to the results of the same assay on a reference sample. In somecases, the results of the assay on the reference sample are from adatabase, or a reference value(s). In some cases, the results of theassay on the reference sample are a known or generally accepted value orrange of values by those skilled in the art. In some cases, thecomparison is qualitative. In other cases, the comparison isquantitative. In some cases, qualitative or quantitative comparisons mayinvolve but are not limited to one or more of the following: comparingexpression levels of a test sample to gene centroids constructed fromexpression level data from a reference sample (e.g., constructed fromexpression level data for one or a plurality of genes from Table 1),fluorescence values, spot intensities, absorbance values,chemiluminescent signals, histograms, critical threshold values,statistical significance values, expression levels of the genesdescribed herein, mRNA copy numbers.

In one embodiment, an odds ratio (OR) is calculated for each biomarkerlevel panel measurement. Here, the OR is a measure of associationbetween the measured biomarker values for the patient and an outcome,e.g., COCA subtype. For example, see, J. Can. Acad. Child Adolesc.Psychiatry 2010; 19(3): 227-229, which is incorporated by reference inits entirety for all purposes.

In one embodiment, a specified statistical confidence level may bedetermined in order to provide a confidence level regarding the COCAsubtype. For example, it may be determined that a confidence level ofgreater than 90% may be a useful predictor of the COCA subtype. In otherembodiments, more or less stringent confidence levels may be chosen. Forexample, a confidence level of about or at least about 50%, 60%, 70%,75%, 80%, 85%, 90%, 95%, 97.5%, 99%, 99.5%, or 99.9% may be chosen. Theconfidence level provided may in some cases be related to the quality ofthe sample, the quality of the data, the quality of the analysis, thespecific methods used, and/or the number of gene expression values(i.e., the number of genes) analyzed. The specified confidence level forproviding the likelihood of response may be chosen on the basis of theexpected number of false positives or false negatives. Methods forchoosing parameters for achieving a specified confidence level or foridentifying markers with diagnostic power include but are not limited toReceiver Operating Characteristic (ROC) curve analysis, binomial ROC,principal component analysis, odds ratio analysis, partial least squaresanalysis, singular value decomposition, least absolute shrinkage andselection operator analysis, least angle regression, and the thresholdgradient directed regularization method.

Determining the COCA subtype in some cases can be improved through theapplication of algorithms designed to normalize and or improve thereliability of the gene expression data. In some embodiments, the dataanalysis utilizes a computer or other device, machine or apparatus forapplication of the various algorithms described herein due to the largenumber of individual data points that are processed. A “machine learningalgorithm” refers to a computational-based prediction methodology, alsoknown to persons skilled in the art as a “classifier,” employed forcharacterizing a gene expression profile or profiles, e.g., to determinethe COCA subtype. The biomarker levels, determined by, e.g.,microarray-based hybridization assays, sequencing assays, NanoStringassays, etc., are in one embodiment subjected to the algorithm in orderto classify the profile. Supervised learning generally involves“training” a classifier to recognize the distinctions among COCAsubtypes such as, for example, C1 ACC/PCPG positive, C2 GBM/LGGpositive, C3 OV positive, C4 Squamous-like positive, C6 LUAD-Enrichedpositive, C8 PAAD/some STAD positive, C9 UCS positive, C10 BRCA/Basalpositive, C12 UCEC positive, C14 PRAD positive, C15 CESC (subset ofcervical) positive, C16 BLCA positive, C17 TGCT positive, C19 COAD/READpositive, C20 SARC/MESO positive, C21 KIRK/KICH/KIRP positive, C22 Liverpositive, C24 BRCA/Luminal positive, C25 THYM positive, C26 SKCM/UVMpositive and C28 THCA positive, and then “testing” the accuracy of theclassifier on an independent test set. Therefore, for new, unknownsamples the classifier can be used to predict, for example, the class(e.g., C1 ACC/PCPG, C2 GBM/LGG, C3 OV, C4 Squamous-like, C6LUAD-Enriched, C8 PAAD/some STAD, C9 UCS, C10 BRCA/Basal, C12 UCEC, C14PRAD, C15 CESC (subset of cervical), C16 BLCA, C17 TGCT, C19 COAD/READ,C20 SARC/MESO, C21 KIRK/KICH/KIRP, C22 Liver, C24 BRCA/Luminal, C25THYM, C26 SKCM/UVM and C28 THCA) in which the samples belong. Themachine learning algorithm can be a CLaNC algorithm as provided herein.

In some embodiments, a robust multi-array average (RMA) method may beused to normalize raw data. The RMA method begins by computingbackground-corrected intensities for each matched cell on a number ofmicroarrays. In one embodiment, the background corrected values arerestricted to positive values as described by Irizarry et al. (2003).Biostatistics April 4 (2): 249-64, incorporated by reference in itsentirety for all purposes. After background correction, the base-2logarithm of each background corrected matched-cell intensity is thenobtained. The background corrected, log-transformed, matched intensityon each microarray is then normalized using the quantile normalizationmethod in which for each input array and each probe value, the arraypercentile probe value is replaced with the average of all arraypercentile points, this method is more completely described by Bolstadet al. Bioinformatics 2003, incorporated by reference in its entirety.Following quantile normalization, the normalized data may then be fit toa linear model to obtain an intensity measure for each probe on eachmicroarray. Tukey's median polish algorithm (Tukey, J. W., ExploratoryData Analysis. 1977, incorporated by reference in its entirety for allpurposes) may then be used to determine the log-scale intensity levelfor the normalized probe set data.

Various other software programs may be implemented. In certain methods,feature selection and model estimation may be performed by logisticregression with lasso penalty using glmnet (Friedman et al. (2010).Journal of statistical software 33(1): 1-22, incorporated by referencein its entirety). Raw reads may be aligned using TopHat (Trapnell et al.(2009). Bioinformatics 25(9): 1105-11, incorporated by reference in itsentirety). In methods, top features (N ranging from 10 to 200) are usedto train a linear support vector machine (SVM) (Suykens J A K,Vandewalle J. Least Squares Support Vector Machine Classifiers. NeuralProcessing Letters 1999; 9(3): 293-300, incorporated by reference in itsentirety) using the e1071 library (Meyer D. Support vector machines: theinterface to libsvm in package e1071. 2014, incorporated by reference inits entirety). Confidence intervals, in one embodiment, are computedusing the pROC package (Robin X, Turck N, Hainard A, et al. pROC: anopen-source package for R and S+ to analyze and compare ROC curves. BMCbioinformatics 2011; 12: 77, incorporated by reference in its entirety).

In addition, data may be filtered to remove data that may be consideredsuspect. In one embodiment, data derived from microarray probes thathave fewer than about 4, 5, 6, 7 or 8 guanosine+cytosine nucleotides maybe considered to be unreliable due to their aberrant hybridizationpropensity or secondary structure issues. Similarly, data deriving frommicroarray probes that have more than about 12, 13, 14, 15, 16, 17, 18,19, 20, 21, or 22 guanosine+cytosine nucleotides may in one embodimentbe considered unreliable due to their aberrant hybridization propensityor secondary structure issues.

In some embodiments, data from probe-sets may be excluded from analysisif they are not identified at a detectable level (above background).

In some embodiments, probe-sets that exhibit no, or low variance may beexcluded from further analysis. Low-variance probe-sets are excludedfrom the analysis via a Chi-Square test. In one embodiment, a probe-setis considered to be low-variance if its transformed variance is to theleft of the 99 percent confidence interval of the Chi-Squareddistribution with (N−1) degrees of freedom. (N−1)*Probe-setVariance/(Gene Probe-set Variance). Chi-Sq(N−1) where N is the number ofinput CEL files, (N−1) is the degrees of freedom for the Chi-Squareddistribution, and the “probe-set variance for the gene” is the averageof probe-set variances across the gene. In some embodiments, probe-setsfor a given mRNA or group of mRNAs may be excluded from further analysisif they contain less than a minimum number of probes that pass throughthe previously described filter steps for GC content, reliability,variance and the like. For example in some embodiments, probe-sets for agiven gene or transcript cluster may be excluded from further analysisif they contain less than about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12,13, 14, 15, or less than about 20 probes.

Methods of biomarker level data analysis in one embodiment, furtherinclude the use of a feature selection algorithm as provided herein. Insome embodiments, feature selection is provided by use of the LIMMAsoftware package (Smyth, G. K. (2005). Limma: linear models formicroarray data. In: Bioinformatics and Computational Biology Solutionsusing R and Bioconductor, R. Gentleman, V. Carey, S. Dudoit, R.Irizarry, W. Huber (eds.), Springer, New York, pages 397-420,incorporated by reference in its entirety for all purposes).

Methods of biomarker level data analysis, in one embodiment, include theuse of a pre-classifier algorithm. For example, an algorithm may use aspecific molecular fingerprint to pre-classify the samples according totheir composition and then apply a correction/normalization factor. Thisdata/information may then be fed in to a final classification algorithmwhich would incorporate that information to aid in the final diagnosis.

Methods of biomarker level data analysis, in one embodiment, furtherinclude the use of a classifier algorithm as provided herein. In oneembodiment, a diagonal linear discriminant analysis, k-nearest neighboralgorithm, support vector machine (SVM) algorithm, linear support vectormachine, random forest algorithm, or a probabilistic model-based methodor a combination thereof is provided for classification of microarraydata. In some embodiments, identified markers that distinguish samples(e.g., of varying biomarker level profiles, and/or varying COCA subtypesof cancer are selected based on statistical significance of thedifference in biomarker levels between classes of interest. In somecases, the statistical significance is adjusted by applying a BenjaminHochberg or another correction for false discovery rate (FDR).

In some cases, the classifier algorithm may be supplemented with ameta-analysis approach such as that described by Fishel and Kaufman etal. 2007 Bioinformatics 23(13): 1599-606, incorporated by reference inits entirety for all purposes. In some cases, the classifier algorithmmay be supplemented with a meta-analysis approach such as arepeatability analysis.

Methods for deriving and applying posterior probabilities to theanalysis of biomarker level data are known in the art and have beendescribed for example in Smyth, G. K. 2004 Stat. Appl. Genet. Mol. Biol.3: Article 3, incorporated by reference in its entirety for allpurposes. In some cases, the posterior probabilities may be used in themethods provided herein to rank the markers provided by the classifieralgorithm.

A statistical evaluation of the results of the biomarker level profilingmay provide a quantitative value or values indicative of one or more ofthe following: COCA subtype of cancer; the likelihood of the success ofa particular therapeutic intervention, e.g., angiogenesis inhibitortherapy, chemotherapy, or immunotherapy. In one embodiment, the data ispresented directly to the physician in its most useful form to guidepatient care, or is used to define patient populations in clinicaltrials or a patient population for a given medication. The results ofthe molecular profiling can be statistically evaluated using a number ofmethods known to the art including, but not limited to: the students Ttest, the two sided T test, Pearson rank sum analysis, hidden Markovmodel analysis, analysis of q-q plots, principal component analysis, oneway ANOVA, two way ANOVA, LIMMA and the like.

In some cases, accuracy may be determined by tracking the subject overtime to determine the accuracy of the original diagnosis. In othercases, accuracy may be established in a deterministic manner or usingstatistical methods. For example, receiver operator characteristic (ROC)analysis may be used to determine the optimal assay parameters toachieve a specific level of accuracy, specificity, positive predictivevalue, negative predictive value, and/or false discovery rate.

In some cases, the results of the biomarker level profiling assays, areentered into a database for access by representatives or agents of amolecular profiling business, the individual, a medical provider, orinsurance provider. In some cases, assay results include sampleclassification, identification, or diagnosis by a representative, agentor consultant of the business, such as a medical professional. In othercases, a computer or algorithmic analysis of the data is providedautomatically. In some cases, the molecular profiling business may billthe individual, insurance provider, medical provider, researcher, orgovernment entity for one or more of the following: molecular profilingassays performed, consulting services, data analysis, reporting ofresults, or database access.

In some embodiments, the results of the biomarker level profiling assaysare presented as a report on a computer screen or as a paper record. Insome embodiments, the report may include, but is not limited to, suchinformation as one or more of the following: the levels of biomarkers(e.g., as reported by copy number or fluorescence intensity, etc.) ascompared to the reference sample or reference value(s); the likelihoodthe subject will respond to a particular therapy, based on the biomarkerlevel values and the COCA subtype and proposed therapies.

In one embodiment, the results of the gene expression profiling may beclassified into one or more of the following: C1 ACC/PCPG positive, C2GBM/LGG positive, C3 OV positive, C4 Squamous-like positive, C6LUAD-Enriched positive, C8 PAAD/some STAD positive, C9 UCS positive, C10BRCA/Basal positive, C12 UCEC positive, C14 PRAD positive, C15 CESC(subset of cervical) positive, C16 BLCA positive, C17 TGCT positive, C19COAD/READ positive, C20 SARC/MESO positive, C21 KIRK/KICH/KIRP positive,C22 Liver positive, C24 BRCA/Luminal positive, C25 THYM positive, C26SKCM/UVM positive or C28 THCA positive, C1 ACC/PCPG negative, C2 GBM/LGGnegative, C3 OV negative, C4 Squamous-like negative, C6 LUAD-Enrichednegative, C8 PAAD/some STAD negative, C9 UCS negative, C10 BRCA/Basalnegative, C12 UCEC negative, C14 PRAD negative, C15 CESC (subset ofcervical) negative, C16 BLCA negative, C17 TGCT negative, C19 COAD/READnegative, C20 SARC/MESO negative, C21 KIRK/KICH/KIRP negative, C22 Livernegative, C24 BRCA/Luminal negative, C25 THYM negative, C26 SKCM/UVMnegative or C28 THCA negative or a combination thereof.

In some embodiments, results are classified using a trained algorithm.Trained algorithms provided herein include algorithms that have beendeveloped using a reference set of known gene expression values and/ornormal samples, for example, samples from individuals diagnosed with aparticular molecular COCA subtype of cancer. In some cases, a referenceset of known gene expression values are obtained from individuals whohave been diagnosed with a particular COCA subtype of cancer. In somecases, a reference set of known gene expression values are obtained fromindividuals who have been diagnosed with a particular COCA subtype ofcancer, and are also known to possess certain immune cell signature. Insome cases, a reference set of known gene expression values are obtainedfrom individuals who have been diagnosed with a particular COCA subtypeof cancer, and are also known to have certain expression of tumor drivergenes.

Algorithms suitable for categorization of samples include but are notlimited to k-nearest neighbor algorithms, support vector machines,linear discriminant analysis, centroid algorithms (e.g., CLaNC),diagonal linear discriminant analysis, updown, naive Bayesianalgorithms, neural network algorithms, hidden Markov model algorithms,genetic algorithms, or any combination thereof.

When a binary classifier is compared with actual true values (e.g.,values from a biological sample), there are typically four possibleoutcomes. If the outcome from a prediction is p (where “p” is a positiveclassifier output, such as the presence of a deletion or duplicationsyndrome) and the actual value is also p, then it is called a truepositive (TP); however if the actual value is n then it is said to be afalse positive (FP). Conversely, a true negative has occurred when boththe prediction outcome and the actual value are n (where “n” is anegative classifier output, such as no deletion or duplicationsyndrome), and false negative is when the prediction outcome is n whilethe actual value is p. In one embodiment, consider a test that seeks todetermine whether a person is likely or unlikely to respond toangiogenesis inhibitor therapy. A false positive in this case occurswhen the person tests positive, but actually does respond. A falsenegative, on the other hand, occurs when the person tests negative,suggesting they are unlikely to respond, when they actually are likelyto respond. The same holds true for classifying a COCA subtype.

The positive predictive value (PPV), or precision rate, or post-testprobability of disease, is the proportion of subjects with positive testresults who are correctly diagnosed as likely or unlikely to respond, ordiagnosed with the correct COCA subtype, or a combination thereof. Itreflects the probability that a positive test reflects the underlyingcondition being tested for. Its value does however depend on theprevalence of the disease, which may vary. In one example, the followingcharacteristics are provided: FP (false positive); TN (true negative);TP (true positive); FN (false negative). False positive rate(α)=FP/(FP+TN)−specificity; False negative rate(β)=FN/(TP+FN)−sensitivity; Power=sensitivity=1−β; Likelihood-ratiopositive=sensitivity/(1−specificity); Likelihood-rationegative=(1−sensitivity)/specificity. The negative predictive value(NPV) is the proportion of subjects with negative test results who arecorrectly diagnosed.

In some embodiments, the results of the biomarker level analysis of thesubject methods provide a statistical confidence level that a givendiagnosis is correct. In some embodiments, such statistical confidencelevel is at least about, or more than about 85%, 90%, 91%, 92%, 93%,94%, 95%, 96%, 97%, 98%, 99% 99.5%, or more.

In some embodiments, the method further includes classifying the tumortissue sample as a particular COCA subtype based on the comparison ofbiomarker levels in the sample and reference biomarker levels, forexample present in at least one training set. In some embodiments, thetumor tissue sample is classified as a particular subtype if the resultsof the comparison meet one or more criterion such as, for example, aminimum percent agreement, a value of a statistic calculated based onthe percentage agreement such as (for example) a kappa statistic, aminimum correlation (e.g., Pearson's correlation) and/or the like.

It is intended that the methods described herein can be performed bysoftware (stored in memory and/or executed on hardware), hardware, or acombination thereof. Hardware modules may include, for example, ageneral-purpose processor, a field programmable gate array (FPGA),and/or an application specific integrated circuit (ASIC). Softwaremodules (executed on hardware) can be expressed in a variety of softwarelanguages (e.g., computer code), including Unix utilities, C, C++,Java™, Ruby, SQL, SAS®, the R programming language/software environment,Visual Basic™, and other object-oriented, procedural, or otherprogramming language and development tools. Examples of computer codeinclude, but are not limited to, micro-code or micro-instructions,machine instructions, such as produced by a compiler, code used toproduce a web service, and files containing higher-level instructionsthat are executed by a computer using an interpreter. Additionalexamples of computer code include, but are not limited to, controlsignals, encrypted code, and compressed code.

Some embodiments described herein relate to devices with anon-transitory computer-readable medium (also can be referred to as anon-transitory processor-readable medium or memory) having instructionsor computer code thereon for performing various computer-implementedoperations and/or methods disclosed herein. The computer-readable medium(or processor-readable medium) is non-transitory in the sense that itdoes not include transitory propagating signals per se (e.g., apropagating electromagnetic wave carrying information on a transmissionmedium such as space or a cable). The media and computer code (also canbe referred to as code) may be those designed and constructed for thespecific purpose or purposes. Examples of non-transitorycomputer-readable media include, but are not limited to: magneticstorage media such as hard disks, floppy disks, and magnetic tape;optical storage media such as Compact Disc/Digital Video Discs(CD/DVDs), Compact Disc-Read Only Memories (CD-ROMs), and holographicdevices; magneto-optical storage media such as optical disks; carrierwave signal processing modules; and hardware devices that are speciallyconfigured to store and execute program code, such asApplication-Specific Integrated Circuits (ASICs), Programmable LogicDevices (PLDs), Read-Only Memory (ROM) and Random-Access Memory (RAM)devices. Other embodiments described herein relate to a computer programproduct, which can include, for example, the instructions and/orcomputer code discussed herein.

In some embodiments, a single biomarker, or from about 1 to about 4,from about 4 to about 8, from about 8 to about 12, from about 12 toabout 16, from about 16 to about 20, from about 20 to about 24, fromabout 24 to about 30, from about 34 to about 38, from about 38 to about42, from about 42 to about 46, from about 46 to about 50, from about 50to about 54, from about 54 to about 58, from about 58 to about 62, fromabout 62 to about 66, from about 66 to about 72, from about 72 to about76, from about 76 to about 80, from about 80 to about 84 (e.g., asdisclosed in Table 1) is capable of classifying COCA subtypes of cancerwith a predictive success of at least about 70%, at least about 71%, atleast about 72%, at least about 73%, at least about 74%, at least about75%, at least about 76%, at least about 77%, at least about 78%, at 1east about 79%, at least about 80%, at least about 81%, at least about82%, at least about 83%, at least about 84%, at least about 85%, atleast about 86%, at least about 87%, at least about 88%, at least about89%, at least about 90%, at least about 91%, at least about 92%, atleast about 93%, at least about 94%, at least about 95%, at least about96%, at least about 97%, at least about 98%, at least about 99%, up to100%, and all values in between. In some embodiments, any combination ofbiomarkers disclosed herein (e.g., in Table 1) can be used to obtain apredictive success of at least about 70%, at least about 71%, at leastabout 72%, at least about 73%, at least about 74%, at least about 75%,at least about 76%, at least about 77%, at least about 78%, at leastabout 79%, at least about 80%, at least about 81%, at least about 82%,at least about 83%, at least about 84%, at least about 85%, at leastabout 86%, at least about 87%, at least about 88%, at least about 89%,at least about 90%, at least about 91%, at least about 92%, at leastabout 93%, at least about 94%, at least about 95%, at least about 96%,at least about 97%, at least about 98%, at least about 99%, up to 100%,and all values in between.

In some embodiments, a single biomarker, or from about 1 to about 4,from about 4 to about 8, from about 8 to about 12, from about 12 toabout 16, from about 16 to about 20, from about 20 to about 24, fromabout 24 to about 30, from about 34 to about 38, from about 38 to about42, from about 42 to about 46, from about 46 to about 50, from about 50to about 54, from about 54 to about 58, from about 58 to about 62, fromabout 62 to about 66, from about 66 to about 72, from about 72 to about76, from about 76 to about 80, from about 80 to about 84 (e.g., asdisclosed in Table 1) is capable of classifying COCA subtypes of cancerwith a sensitivity or specificity of at least about 70%, at least about71%, at least about 72%, at least about 73%, at least about 74%, atleast about 75%, at least about 76%, at least about 77%, at least about78%, at least about 79%, at least about 80%, at least about 81%, atleast about 82%, at least about 83%, at least about 84%, at least about85%, at least about 86%, at least about 87%, at least about 88%, atleast about 89%, at least about 90%, at least about 91%, at least about92%, at least about 93%, at least about 94%, at least about 95%, atleast about 96%, at least about 97%, at least about 98%, at least about99%, up to 100%, and all values in between. In some embodiments, anycombination of biomarkers disclosed herein can be used to obtain asensitivity or specificity of at least about 70%, at least about 71%, atleast about 72%, at least about 73%, at least about 74%, at least about75%, at least about 76%, at least about 77%, at least about 78%, atleast about 79%, at least about 80%, at least about 81%, at least about82%, at least about 83%, at least about 84%, at least about 85%, atleast about 86%, at least about 87%, at least about 88%, at least about89%, at least about 90%, at least about 91%, at least about 92%, atleast about 93%, at least about 94%, at least about 95%, at least about96%, at least about 97%, at least about 98%, at least about 99%, up to100%, and all values in between.

Classifier Biomarker Selection

In one embodiment, the methods and compositions provided herein areuseful for determining the clustering of cluster assignments (COCA)subtype of a sample (e.g., tumor sample) from a patient by analyzing theexpression of a set of biomarkers, whereby use of the set of biomarkersin detecting a COCA subtype comprises use of a fewer number ofbiomarkers from a single genome-wide platform as compared to methodsknown in the art for molecularly classifying a cell of origin cancersubtype (e.g., Hoadley et al. “Cell-of-origin patterns dominate themolecular classification of 10,000 tumors from 33 types of cancer.”Cell173, no. 2 (2018): 291-304, and Hoadley et al. “Multiplatformanalysis of 12 cancer types reveals molecular classification within andacross tissues of origin.” Cell 158, no. 4 (2014): 929-944, both ofwhich are herein incorporated by reference). In some cases, the set ofbiomarkers is less than 300, 290, 280, 270, 260, 250, 240, 230, 220,210, 200, 150, 100 or 90 biomarkers. In some cases, the set ofbiomarkers is between 4 and 84 biomarkers. In some cases, the set ofbiomarkers is the set of 84 biomarkers listed in Table 1. In some cases,the set of biomarkers is a sub-set of biomarkers listed Table 1 such as,for example 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32,34, 36, 40, 42, 44, 46, 48, 50, 52, 54, 56, 58, 60, 62, 64, 66, 68, 70,72, 74, 76, 78, 80 or 82 of the biomarkers listed in Table 1. Thebiomarkers or classifier biomarkers useful in the methods andcompositions provided herein can be selected from one or more cancerdatasets from one or more databases. The cancers can be any cancer knownin the art. The cancers can include hematologic and lymphaticmalignancies, solid tumor types, cancers of the central nervous system,cancers from neural-crest-derived tissues, and melanocytic cancers ofthe skin. The cancers for use in the methods herein can be the cancersstudied in The Cancer Genome Atlas (TCGA) or a subset thereof. Thecancers for use in the method provided herein can be those cancerslisted herein. The databases can be public databases.

In one embodiment, classifier biomarkers (e.g., one or more genes listedin Table 1) useful in the methods and compositions provided herein fordetecting or diagnosing subtypes were selected from a large data set ofpotential classifier biomarkers. In one embodiment, classifierbiomarkers useful for the methods and compositions provided herein suchas those in Table 1 are selected by subjecting a large set of classifierbiomarkers to an in silico based process in order to determine theminimum number of genes whose expression profile can be used todetermine a pan-cancer COCA subtype of a subject from a sample obtainedfrom said subject. In some cases, the large set of classifier biomarkerscan be a pan-cancer dataset such as, for example, the mRNA expressiondata (i.e., RNA-seq data) from TCGA found atgdc.cancer.gov/about-data/publications/pancanatlas. In some cases, thelarge set of classifier biomarkers can be the genes derived from themRNA expression profile data derived from more than 10,000 tumors acrossmore than 30 tumor types as described in Hoadley et al. “Cell-of-originpatterns dominate the molecular classification of 10,000 tumors from 33types of cancer.” Cell 173, no. 2 (2018): 291-304, which comprised oneof several genome-wide molecular platforms that together can serve todefine the gold standard (GS) COCA subtyper. The in silico process forselecting a gene signature as provided herein (e.g., Table 1 and 2) fordetermining a COCA subtype of a sample from a patient can compriseapplying or using a Classification to Nearest Centroid (CLaNC) algorithmon the pan-cancer mRNA expression data (i.e., RNA-seq data) from TCGA tochoose a minimum number of correlated genes for each subtype. Fordetermination of the optimal number of genes (e.g., 84 genes as shown inTable 1) to include in the signature, the process can further compriseperforming a 5-fold cross validation using the TCGA pan-cancer datasetfollowing application of the CLaNC algorithm as provided herein toproduce cross-validation curves to test different numbers of correlatedgenes as shown in FIG. 3 in order to determine the minimum number ofcorrelated genes needed per subtype. To get the final list of geneclassifiers, the method can further comprise applying the CLaNCalgorithm to the entire TCGA mRNA expression pan-cancer dataset. TheCLaNC software used in the methods provided herein can be as found in orderived from Alan R. Dabney; ClaNC: point-and-click software forclassifying microarrays to nearest centroids, Bioinformatics, Volume 22,Issue 1, 1 Jan. 2006, Pages 122-123).

In one embodiment, the method further comprises validating the geneclassifiers. Validation can comprise testing the expression of theclassifiers in a test set of samples and comparing the COCA subtypedetermined using the signature of Table 1 with the COCA subtypedetermined using the gold standard COCA subtyper method described inHoadley et al. “Cell-of-origin patterns dominate the molecularclassification of 10,000 tumors from 33 types of cancer.” Cell173, no. 2(2018): 291-304. The test set of samples can be any sample type providedherein such as, for example, fresh frozen or archived formalin-fixedparaffin-embedded (FFPE) cancer samples. In one embodiment, validationcan comprise testing the expression of the classifiers in several freshfrozen publicly available array and/or RNAseq datasets and calling thesubtype based on said expression levels and subsequently comparing theCOCA subtype determined using the signature of Table 1 with the COCAsubtype determined using the gold standard COCA subtyper methoddescribed in Hoadley et al. “Cell-of-origin patterns dominate themolecular classification of 10,000 tumors from 33 types of cancer.” Cell173, no. 2 (2018): 291-304. In other words, validation can comprisecalling the subtypes of the several fresh frozen publicly availablearray and RNAseq test datasets using their expression levels and theCLaNC algorithm as described herein and comparing the subtype calls withthe gold standard subtype calls as defined in Hoadley et al.“Cell-of-origin patterns dominate the molecular classification of 10,000tumors from 33 types of cancer.” Cell 173, no. 2 (2018): 291-304. Finalvalidation of the gene signature (e.g., Table 1) can then be performedin a newly collected dataset of archived formalin-fixedparaffin-embedded (FFPE) cancer samples to assure comparable performancein the FFPE samples. In one embodiment, the classifier biomarkers ofTable 1 were selected based on the in silico CLaNC process describedherein. The gene symbols and official gene names are listed in Table 1.Further to the above embodiments, the in silico CLaNC process can entailuse of the CLaNC process described in Dabney (2005) Bioinformatics21(22):4148-4154. In one embodiment, the in silico CLaNC process canentail use of CLaNC software described in Dabney A R. ClaNC:Point-and-click software for classifying microarrays to nearestcentroids. Bioinformatics. 2006; 22: 122-123 or equivalents orderivatives related thereto.

In one embodiment, the methods provided herein require the detection ofthe expression level of at least 1, at least 2, at least 3, at least 4,at least 5, at least 6, at least 7, at least 8, at least 9, at least 10,at least 11, at least 12, at least 13, at least 14, at least 15, atleast 16, at least 18, at least 20, at least 22, at least 24, at least26, at least 28, at least 30, at least 32, at least 34, at least 35, atleast 36, at least 37, at least 38, at least 39, at least 40, at least41, at least 42, at least 43, at least 44, at least 45, at least 46, atleast 47, at least 48, at least 49, at least 50, at least 51, at least52, at least 53, at least 54, at least 55, at least 56, at least 57, atleast 58, at least 59, at least 60, at least 61, at least 62, at least63, at least 64, at least 65, at least 66, at least 67, at least 68, atleast 69, at least 70, at least 71, at least 72, at least 73, at least74, at least 75, at least 76, at least 77, at least 78, at least 79, atleast 80, at least 81, at least 82, at least 83 or up to 84 classifierbiomarkers (e.g., from Table 1) in a cancer sample obtained from apatient whose expression is altered in order to identify a COCA cancersubtype. The same applies for other classifier biomarker expressiondatasets as provided herein.

In another embodiment, the methods provided herein require the detectionof the expression level of a total of at least 2, at least 4, at least6, at least 8, at least 10, at least 12, at least 14, at least 16, atleast 18, at least 20, at least 22, at least 24, at least 26, at least28, at least 30, at least 32, at least 34, at least 36, at least 38, atleast 40, at least 42, at least 44, at least 46, at least 48, at least50, at least 52, at least 54, at least 56, at least 58, at least 60, atleast 62, at least 64, at least 66, at least 68, at least 70, at least72, at least 74, at least 76, at least 78, at least 80, at least 82 orup to 84 classifier biomarkers out of the 84 gene biomarkers of Table 1in a cancer cell sample obtained from a patient in order to identify aCOCA cancer subtype. In another embodiment, the methods provided hereinrequire the detection of the expression level of a total of at least 4,at least 8, at least 12, at least 16, at least 20, at least 24, at least28, at least 32, at least 36, at least 40, at least 44, at least 48, atleast 52, at least 56, at least 60, at least 64, at least 68, at least72, at least 76, at least 80 or up to 84 classifier biomarkers out ofthe 84 gene biomarkers of Table 1 in a cancer cell sample obtained froma patient in order to identify a COCA cancer subtype. The same appliesfor other classifier biomarker expression datasets as provided herein.

In one embodiment, the expression level of one or more classifierbiomarkers of Table 1 can be altered in a specific COCA subtype asdetected in a sample obtained from a subject as described in any of themethods provided herein. The alteration of the expression level can bean “up-regulation” or “down-regulation” of the one or more classifierbiomarkers of Table 1. In one embodiment, at least 1, at least 2, atleast 3, at least 4, at least 5, at least 6, at least 7, at least 8, atleast 9, at least 10, at least 11, at least 12, at least 13, at least14, at least 15, at least 16, at least 18, at least 20, at least 22, atleast 24, at least 26, at least 28, at least 30, at least 32, at least34, at least 35, at least 36, at least 37, at least 38, at least 39, atleast 40, at least 41, at least 42, at least 43, at least 44, at least45, at least 46, at least 47, at least 48, at least 49, at least 50, atleast 51, at least 52, at least 53, at least 54, at least 55, at least56, at least 57, at least 58, at least 59, at least 60, at least 61, atleast 62, at least 63, at least 64, at least 65, at least 66, at least67, at least 68, at least 69, at least 70, at least 71, at least 72, atleast 73, at least 74, at least 75, at least 76, at least 77, at least78, at least 79, at least 80, at least 81, at least 82, at least 83 orup to 84 classifier biomarkers out of the 84 gene biomarkers of Table 1are “up-regulated” in a specific COCA subtype of cancer. In anotherembodiment, at least 1, at least 2, at least 3, at least 4, at least 5,at least 6, at least 7, at least 8, at least 9, at least 10, at least11, at least 12, at least 13, at least 14, at least 15, at least 16, atleast 18, at least 20, at least 22, at least 24, at least 26, at least28, at least 30, at least 32, at least 34, at least 35, at least 36, atleast 37, at least 38, at least 39, at least 40, at least 41, at least42, at least 43, at least 44, at least 45, at least 46, at least 47, atleast 48, at least 49, at least 50, at least 51, at least 52, at least53, at least 54, at least 55, at least 56, at least 57, at least 58, atleast 59, at least 60, at least 61, at least 62, at least 63, at least64, at least 65, at least 66, at least 67, at least 68, at least 69, atleast 70, at least 71, at least 72, at least 73, at least 74, at least75, at least 76, at least 77, at least 78, at least 79, at least 80, atleast 81, at least 82, at least 83 or up to 84 classifier biomarkers outof the 84 gene biomarkers of Table 1 are “down-regulated” in a specificCOCA subtype of cancer. In a still further embodiment, in methodsprovided herein utilizing more than one classifier biomarker (e.g., morethan one classifier biomarker from Table 1) to determine a COCA subtype,the alteration in expression levels of the more than one classifierbiomarkers can either be an up-regulation, a down-regulation or anycombination thereof. Further to any of the above embodiments, thealteration of the expression level can be relative to or compared to asample isolated from a healthy subject as defined herein. The sampleobtained from the healthy subject can be form the same anatomical areaof the body. The same applies for other classifier biomarker expressiondatasets as provided herein.

In one embodiment, the expression level of an “up-regulated” biomarkeras provided herein is increased by about 0.2-fold, about 0.5-fold, about1-fold, about 1.5-fold, about 2-fold, about 2.5-fold, about 3-fold,about 3.5-fold, about 4-fold, about 4.5-fold, about 5-fold, and anyvalues in between. In another embodiment, the expression level of a“down-regulated” biomarker as provided herein is decreased by about0.2-fold, about 0.5-fold, about 1-fold, about 1.5-fold, about 2-fold,about 2.5-fold, about 3-fold, about 3.5-fold, about 4-fold, about4.5-fold, about 5-fold, and any values in between.

It is recognized that additional genes or proteins or molecularplatforms can be used in the practice of the methods provided herein. Ingeneral, genes useful in classifying the COCA subtypes of cancer includethose that are independently capable of distinguishing between normalversus tumor, or between different classes or grades of cancer. A geneis considered to be capable of reliably distinguishing between COCAsubtypes if the area under the receiver operator characteristic (ROC)curve is approximately 1. Further, in general, molecular platforms thatgenerate data that can be useful in classifying the COCA subtypes ofcancer can include genome-wide platforms such as, for example,whole-exome DNA sequencing assays (e.g., Illumina HiSeq and GAII), DNAcopy-number variation assays (e.g., Affymetrix 6.0 microarrays), DNAmethylation assays (e.g., Illumina 450,000-feature microarrays),genome-wide mRNA level assays (e.g., Illumina mRNA-seq), microRNA levelassays (e.g., Illumina microRNA-seq), and protein level assays forproteins and/or phosphorylated proteins (e.g., Reverse Phase ProteinArrays; RPPA).

Clinical/Therapeutic Uses

In one embodiment, a method is provided herein for determining a diseaseoutcome or prognosis for a patient suffering from cancer. In some cases,the cancer can be any cancer known in the art and/or provided herein.The disease outcome or prognosis can be measured by examining theoverall survival for a period of time or intervals (e.g., 0 to 36 monthsor 0 to 60 months). In one embodiment, survival is analyzed as afunction of COCA subtype. In one embodiment, survival is analyzed as afunction of COCA subtype across tissue of origin tumor types. In oneembodiment, survival is analyzed as a function of COCA subtype within atissue of origin tumor type (see, for example, FIGS. 6-8). The COCAsubtype can be determined using the methods provided herein such as, forexample, determining the expression of all or subsets of the genes inTable 1. Relapse-free and overall survival can be assessed usingstandard Kaplan-Meier plots as well as Cox proportional hazardsmodeling.

In one embodiment, the methods and compositions as provided herein fordetermining a COCA subtype of a patient suffering or suspected ofsuffering from cancer is used to determine whether or not said patientis a candidate for treatment with a specific type or types of cancertherapy. The sample can be any type of sample obtained from the patientas provided herein. The cancer can be any type of cancer known in theart and/or provided herein. In one embodiment, determining the COCAsubtype is one of a number of methods that can be employed tocharacterize the sample obtained from the patient such that thedetermining the COCA subtype alone or in combination with one or more ofthe number of methods can be used to determine whether or not saidpatient is a candidate for treatment with a specific type or types ofcancer therapy. In addition to assessing or determining a COCA subtype,the number of methods for characterizing the sample can entaildetermining a proliferation score, the tumor mutation burden (TMB), thetissue of origin subtype, the level of immune activation or anycombination thereof. In one embodiment, one or all of the methods forcharacterizing the sample can be performed on RNA sequencing dataobtained from the sample.

In one embodiment, in addition to assessing the COCA subtype as providedherein, the characterization entails determining proliferation orproliferation score. In one embodiment, proliferation or theproliferation score is determined using any method known in the art suchas, for example, as provided in U.S. 62/789,668 filed Jan. 8, 2019,which is herein incorporated by reference herein.

In one embodiment, in addition to determining the COCA subtype asprovided herein, the characterization entails calculating a TMB valueand/or rate. The TMB value and/or rate can be calculated using anymethod known in the art. In one embodiment, the TMB value and/or ratecan be calculated from RNA (e.g., via transcriptome profiling or RNAsequencing)) as provided in U.S. 62/771,702 filed Nov. 27, 2018 and U.S.62/743,257 filed Oct. 9, 2018, which is herein incorporated by referenceherein.

The determination of whether or not said patient is a candidate fortreatment with a specific type or types of cancer therapy can be basedon the COCA subtype alone or in combination with other methods known inthe art for characterizing a sample obtained from a patient sufferingfrom or suspected of suffering from cancer. The other methods forcharacterizing said sample can be histologically based methods, geneexpression based methods or a combination thereof. The histologicallybased methods can include histological cancer subtyping by one or moretrained pathologists as well as the histological based methods ofassessing proliferation such as, for example, determining the mitoticactivity index. The gene expression based methods can include subtyping,assessment of TMB, assessment of tissue of origin subtype, immunesubtyping or any combination thereof. The gene expression based methodscan be assessed from DNA, RNA or a combination thereof. In oneembodiment, the characterization of the sample obtained from the patientsuffering from or suspected of suffering from cancer is performed on RNAobtained or isolated from the sample.

The gene expression based tissue of origin cancer subtyping can bedetermined using gene signatures known in the art for specific types ofcancer. In one embodiment, the tissue of origin of the cancer is thelung and the gene signature is selected from the gene signatures foundin WO2017/201165, WO2017/201164, US20170114416 or U.S. Pat. No.8,822,153, each of which is herein incorporated by reference in theirentirety. In one embodiment, the tissue of origin cancer is head andneck squamous cell carcinoma (HNSCC) and the gene signature is selectedfrom the gene signatures found in PCT/US18/45522 or PCT/US18/48862, eachof which is herein incorporated by reference in their entirety. In oneembodiment, the tissue of origin cancer is breast cancer and the genesignature is the PAM50 subtyper found in Parker J S et al., (2009)Supervised risk predictor of breast cancer based on intrinsic subtypes.J Clin Oncol 27:1160-1167, which is herein incorporated by reference inits entirety. In one embodiment, the tissue of origin cancer is bladdercancer (e.g., MIBC) and the gene signature is selected from the genesignatures found in 62/629,975 filed Feb. 13, 2018, which is hereinincorporated by reference in their entirety. In one embodiment, thetissue of origin cancer is bladder cancer (e.g., MIBC) and the genesignature is selected from the gene signature found in The Cancer GenomeAtlas Research Network. Comprehensive molecular characterization ofurothelial bladder carcinoma. Nature volume 507, pages 315-322 (2014),or Robertson, A G, et al., Cell, 171(3): 540-556 (2017), each of whichis herein incorporated by reference, which is herein incorporated byreference in their entirety.

The gene expression based immune subtyping or immune cell activation canbe determined using immune expression signatures known in the art suchas, for example, the gene signatures found in Thorsson, V., Gibbs, D.L., Brown, S. D., Wolf, D., Bortone, D. S., Yang, T. H. O., Porta-Pardo,E., Gao, G. F., Plaisier, C. L., Eddy, J. A. and Ziv, E., 2018, Theimmune landscape of cancer. Immunity, 48(4), pp. 812-830, which isherein incorporated by reference in its entirety. In one embodiment,immune cell activation is determined by monitoring the immune cellsignatures of Bindea et al (Immunity 2013; 39(4); 782-795), the contentsof which are herein incorporated by reference in its entirety. In oneembodiment, the method further comprises measuring single gene immunebiomarkers, such as, for example, CTLA4, PDCD1 and CD274 (PD-LI),PDCDLG2(PD-L2) and/or IFN gene signatures. In one embodiment, the levelof immune cell activation is determined by measuring gene expressionsignatures of immunomarkers. The immunomarkers can be measured in thesame and/or different sample used to determine the COCA subtype asdescribed herein. The immunomarkers can be those found in WO2017/201165,and WO2017/201164, each of which is herein incorporated by reference intheir entirety.

The gene expression based method for calculating a TMB value and/or ratecan be any method known in the art. In one embodiment, the TMB valueand/or rate can be calculated from RNA (e.g., via transcriptomeprofiling or RNA sequencing)) as provided in U.S. 62/771,702 filed Nov.27, 2018 and U.S. 62/743,257 filed Oct. 9, 2018, which is hereinincorporated by reference herein.

In one embodiment, upon determining a patient's COCA subtype (e.g., bymeasuring the expression of all or subsets of the genes in Table 1), thepatient is selected for suitable therapy, for example, radiotherapy(radiation therapy), surgical intervention, target therapy, chemotherapyor drug therapy with an angiogenesis inhibitor or immunotherapy orcombinations thereof. In some embodiments, the suitable treatment can beany treatment or therapeutic method that can be used for a cancerpatient. In one embodiment, upon determining a patient's COCA subtype,the patient is administered a suitable therapeutic agent, for examplechemotherapeutic agent(s) or an angiogenesis inhibitor orimmunotherapeutic agent(s). In one embodiment, the therapy isimmunotherapy, and the immunotherapeutic agent is a checkpointinhibitor, monoclonal antibody, biological response modifier,therapeutic vaccine or cellular immunotherapy. In some embodiments, thedetermination of a suitable treatment can identify treatment responders.In some embodiments, the determination of a suitable treatment canidentify treatment non-responders. In some embodiments, upon determininga patient's COCA subtype, the cancer patient can be selected for anycombination of suitable therapies. For example, chemotherapy or drugtherapy with a radiotherapy, a tumor dissection with an immunotherapy ora chemotherapeutic agent with a radiotherapy. In some embodiments,immunotherapy, or immunotherapeutic agent can be a checkpoint inhibitor,monoclonal antibody, biological response modifier, therapeutic vaccineor cellular immunotherapy.

The methods provided herein are also useful for evaluating clinicalresponse to therapy, as well as for endpoints in clinical trials forefficacy of new therapies. The extent to which sequential diagnosticexpression profiles move towards normal can be used as one measure ofthe efficacy of the candidate therapy.

In one embodiment, the methods provided herein also find use inpredicting response to different lines of therapies based on the COCAsubtype of cancer alone or in combination with other characterizationmethods as described herein (e.g., tissue of origin cancer subtype,immune subtype, proliferation and/or TMB status). For example,chemotherapeutic response can be improved by more accurately assigningtumor cell of origin subtypes. Likewise, treatment regimens can beformulated based on the COCA subtype alone or in combination with othercharacterization methods as described herein (e.g., tissue of origincancer subtype, immune subtype, proliferation and/or TMB status).

Immunotherapy

In one embodiment, provided herein is a method for determining whether acancer patient is likely to respond to immunotherapy by determining theCOCA subtype of cancer of a sample obtained from the patient and, basedon the COCA subtype, assessing whether the patient is likely to respondto immunotherapy. In another embodiment, provided herein is a method ofselecting a patient suffering from cancer for immunotherapy bydetermining a COCA subtype of a sample from the patient and, based onthe COCA subtype, selecting the patient for immunotherapy. Thedetermination of the COCA subtype of the sample obtained from thepatient can be performed using any method for COCA subtyping known inthe art. The determination of the COCA subtype of the sample obtainedfrom the patient can be performed using any method for COCA subtypingprovided herein. In one embodiment, the sample obtained from the patienthas been previously diagnosed as being a particular type of cancer, andthe methods provided herein are used to determine the COCA subtype ofthe sample. The previous diagnosis can be based on a histologicalanalysis. The histological analysis can be performed by one or morepathologists. In one embodiment, the COCA subtyping is performed viagene expression analysis of a set or panel of biomarkers or subsetsthereof in order to generate an expression profile. The gene expressionanalysis can be performed on a tumor sample obtained from a patient inorder to determine the presence, absence or level of expression of oneor more biomarkers selected from a publically available pan-cancerdatabase described herein and/or Table 1 provided herein. The COCAsubtype can be selected from the group consisting of C1 (ACC/PCPG), C2(GBM/LGG), C3 (OV), C4 (Squamous-like), C6 (LUAD-Enriched), C8(PAAD/some STAD), C9 (UCS), C10 (BRCA/Basal), C12 (UCEC), C14 (PRAD),C15 (CESC (subset of cervical)), C16 (BLCA), C17 (TGCT), C19(COAD/READ), C20 (SARC/MESO), C21 (KIRK/KICH/KIRP), C22 (Liver), C24(BRCA/Luminal), C25 (THYM), C26 (SKCM/UVM) and C28 (THCA). Theimmunotherapy can be any immunotherapy provided herein. In oneembodiment, the immunotherapy comprises administering one or morecheckpoint inhibitors. The checkpoint inhibitors can be any checkpointinhibitor provided herein such as, for example, a checkpoint inhibitorthat targets PD-1, PD-LI or CTLA4.

As disclosed herein, the biomarkers panels, or subsets thereof, can bethose disclosed in any publically available pan-cancer gene expressiondataset or datasets. In one embodiment, the biomarker panel or subsetthereof is, for example, the cancer genome atlas pan-cancer mRNAexpression dataset. In one embodiment, the biomarker panel or subsetthereof is, for example, the pan-cancer mRNA expression datasetdisclosed in Hoadley, Katherine A., Christina Yau, Toshinori Hinoue,Denise M. Wolf, Alexander J. Lazar, Esther Drill, Ronglai Shen et al.“Cell-of-origin patterns dominate the molecular classification of 10,000tumors from 33 types of cancer.” Cell173, no. 2 (2018): 291-304, thecontents of which are herein incorporated by reference in its entirety.In one embodiment, the biomarker panel or subset thereof is, forexample, the gene expression signature disclosed in Table 1 incombination with one or more biomarkers from a publically availablepan-cancer expression dataset.

In one embodiment, from about 1 to about 4, about 4 to about 8, fromabout 4 to about 12, from about 4 to about 16, from about 4 to about 20,from about 4 to about 24, from about 4 to about 28, from about 4 toabout 32, from about 4 to about 36, from about 4 to about 40, from about4 to about 44, from about 4 to about 48, from about 4 to about 52, fromabout 4 to about 56, from about 4 to about 60, from about 4 to about 64,from about 4 to about 68, from about 4 to about 72, from about 4 toabout 76, from about 4 to about 80 or from about 4 to about 84 of thebiomarkers in any of the pan-cancer gene expression datasets providedherein, including, for example, Table 1 for a tumor sample are detectedin a method to determine the COCA subtype as provided herein. In anotherembodiment, each of the biomarkers from any one of the pan-cancer geneexpression datasets provided herein, including, for example, Table 1 fora tumor sample are detected in a method to determine the COCA subtype asprovided herein.

In one embodiment, the methods provided herein further comprisedetermining the presence, absence or level of immune activation in aCOCA subtype. The presence or level of immune cell activation can bedetermined by creating an expression profile or detecting the expressionof one or more biomarkers associated with innate immune cells and/oradaptive immune cells associated with each COCA subtype in a sampleobtained from a patient. In one embodiment, immune cell activationassociated with a COCA subtype of cancer is determined by monitoring theimmune cell signatures of Thorsson, V. et al., 2018, The immunelandscape of cancer. Immunity, 48(4), pp. 812-830, Bindea et al(Immunity 2013; 39(4); 782-795) Faruki H. et al., JTO, 12(6): 943-953(2017), Charoentong P. et al., Cell reports, 18, 248-262 (2017) and/orWO2017/201165 and WO2017/201164, the contents of each of which areherein incorporated by reference in its entirety. In one embodiment, themethod further comprises measuring single gene immune biomarkers, suchas, for example, CTLA4, PDCD1 and CD274 (PD-LI), PDCDLG2(PD-L2) and/orIFN gene signatures. The presence or a detectable level of immuneactivation (Innate and/or Adaptive) associated with a COCA subtype canindicate or predict that a patient with said COCA subtype may beamendable to immunotherapy. The immunotherapy can be treatment with acheckpoint inhibitor as provided herein. In one embodiment, a method isprovided herein for detecting the expression of at least one classifierbiomarker provided herein in a sample (e.g., tumor sample) obtained froma patient further comprises administering an immunotherapeutic agentfollowing detection of immune activation as provided herein in saidsample.

In one embodiment, the method comprises determining a COCA subtype of atumor sample and subsequently determining a level of immune cellactivation of said sub-type. In one embodiment, the subtype isdetermined by determining the expression levels of one or moreclassifier biomarkers at the nucleic acid level using sequencing (e.g.,RNASeq), amplification (e.g., qRT-PCR) or hybridization assays (e.g.,microarray analysis) as described herein. The one or more biomarkers canbe selected from a publically available database (e.g., TCGA pan-cancermRNA expression datasets or any other publically available pan-cancergene expression datasets provided herein). In some embodiments, thebiomarkers of Table 1 can be used to specifically determine the COCAsubtype of a tumor sample obtained from a patient. In one embodiment,the level of immune cell activation is determined by measuring geneexpression signatures of immunomarkers. The immunomarkers can bemeasured in the same and/or different sample used to subtype the tumorsample as described herein. The immunomarkers that can be measured cancomprise, consist of, or consistently essentially of innate immune cell(IIC) and/or adaptive immune cell (AIC) gene signatures, interferon(IFN) gene signatures, individual immunomarkers, majorhistocompatability complex class II (MHC class II) genes or acombination thereof. The gene expression signatures for IICs, AICs, IFNand MHC class II can be any known gene signatures for said cell types orgenes known in the art. For example, the immune gene signatures can bethose from Bindea et al. (Immunity 2013; 39(4); 782-795), Faruki H. etal., JTO, 12(6): 943-953 (2017), Charoentong P. et al., Cell reports,18, 248-262 (2017) and/or WO2017/201165 and WO2017/201164. Theindividual immunomarkers can be CTLA4, PDCD1 and CD274 (PD-L1). In oneembodiment, immune subtyping or immune cell activation can be determinedusing the gene signatures found in Thorsson, V., Gibbs, D. L., Brown, S.D., Wolf, D., Bortone, D. S., Yang, T. H. O., Porta-Pardo, E., Gao, G.F., Plaisier, C. L., Eddy, J. A. and Ziv, E., 2018, The immune landscapeof cancer. Immunity, 48(4), pp. 812-830.

In one embodiment, upon determining a patient's COCA cancer subtypeusing any of the methods and classifier biomarkers panels or subsetsthereof as provided herein, the patient is selected for treatment withor administered an immunotherapeutic agent. The immunotherapeutic agentcan be a checkpoint inhibitor, monoclonal antibody, biological responsemodifiers, therapeutic vaccine or cellular immunotherapy.

In another embodiment, the immunotherapeutic agent is a checkpointinhibitor. In some cases, a method for determining the likelihood ofresponse to one or more checkpoint inhibitors is provided. In oneembodiment, the checkpoint inhibitor is a PD-1/PD-LI checkpointinhibitor. The PD-1/PD-LI checkpoint inhibitor can be nivolumab,pembrolizumab, atezolizumab, durvalumab, lambrolizumab, or avelumab. Inone embodiment, the checkpoint inhibitor is a CTLA-4 checkpointinhibitor. The CTLA-4 checkpoint inhibitor can be ipilimumab ortremelimumab. In one embodiment, the checkpoint inhibitor is acombination of checkpoint inhibitors such as, for example, a combinationof one or more PD-1/PD-LI checkpoint inhibitors used in combination withone or more CTLA-4 checkpoint inhibitors.

In one embodiment, the immunotherapeutic agent is a monoclonal antibody.In some cases, a method for determining the likelihood of response toone or more monoclonal antibodies is provided. The monoclonal antibodycan be directed against tumor cells or directed against tumor products.The monoclonal antibody can be panitumumab, matuzumab, necitumunab,trastuzumab, amatuximab, bevacizumab, ramucirumab, bavituximab,patritumab, rilotumumab, cetuximab, immu-132, or demcizumab.

In yet another embodiment, the immunotherapeutic agent is a therapeuticvaccine. In some cases, a method for determining the likelihood ofresponse to one or more therapeutic vaccines is provided. Thetherapeutic vaccine can be a peptide or tumor cell vaccine. The vaccinecan target MAGE-3 antigens, NY-ESO-1 antigens, p53 antigens, survivinantigens, or MUC1 antigens. The therapeutic cancer vaccine can be GVAX(GM-CSF gene-transfected tumor cell vaccine), belagenpumatucel-L(allogeneic tumor cell vaccine made with four irradiated NSCLC celllines modified with TGF-beta2 antisense plasmid), MAGE-A3 vaccine(composed of MAGE-A3 protein and adjuvant AS15), (1)-BLP-25 anti-MUC-1(targets MUC-1 expressed on tumor cells), CimaVax EGF (vaccine composedof human recombinant Epidermal Growth Factor (EGF) conjugated to acarrier protein), WT1 peptide vaccine (composed of four Wilms' tumorsuppressor gene analogue peptides), CRS-207 (live-attenuated Listeriamonocytogenes vector encoding human mesothelin), Bec2/BCG (inducesanti-GD3 antibodies), GV1001 (targets the human telomerase reversetranscriptase), TG4010 (targets the MUC1 antigen), racotumomab(anti-idiotypic antibody which mimicks the NGcGM3 ganglioside that isexpressed on multiple human cancers), tecemotide (liposomal BLP25;liposome-based vaccine made from tandem repeat region of MUC1) orDRibbles (a vaccine made from nine cancer antigens plus TLR adjuvants).

In one embodiment, the immunotherapeutic agent is a biological responsemodifier. In some cases, a method for determining the likelihood ofresponse to one or more biological response modifiers is provided. Thebiological response modifier can trigger inflammation such as, forexample, PF-3512676 (CpG 7909) (a toll-like receptor 9 agonist), CpG-ODN2006 (downregulates Tregs), Bacillus Calmette-Guerin (BCG),mycobacterium vaccae (SRL172) (nonspecific immune stimulants now oftentested as adjuvants). The biological response modifier can be cytokinetherapy such as, for example, IL-2+tumor necrosis factor alpha(TNF-alpha) or interferon alpha (induces T-cell proliferation),interferon gamma (induces tumor cell apoptosis), or Mda-7 (IL-24)(Mda-7/IL-24 induces tumor cell apoptosis and inhibits tumorangiogenesis). The biological response modifier can be acolony-stimulating factor such as, for example granulocytecolony-stimulating factor. The biological response modifier can be amulti-modal effector such as, for example, multi-target VEGFR:thalidomide and analogues such as lenalidomide and pomalidomide,cyclophosphamide, cyclosporine, denileukin diftitox, talactoferrin,trabecetedin or all-trans-retinoic acid.

In one embodiment, the immunotherapy is cellular immunotherapy. In somecases, a method for determining the likelihood of response to one ormore cellular therapeutic agents. The cellular immunotherapeutic agentcan be dendritic cells (DCs) (ex vivo generated DC-vaccines loaded withtumor antigens), T-cells (ex vivo generated lymphokine-activated killercells; cytokine-induce killer cells; activated T-cells; gamma deltaT-cells), or natural killer cells.

In some cases, specific COCA subtypes of cancer have different levels ofimmune activation (e.g., innate immunity and/or adaptive immunity) suchthat COCA subtypes with elevated or detectable immune activation (e.g.,innate immunity and/or adaptive immunity) are selected for treatmentwith one or more immunotherapeutic agents described herein. In somecases, specific COCA subtypes of cancer have high or elevated levels ofimmune activation. In some cases, the C1 (ACC/PCPG), C2 (GBM/LGG), C3(OV), C4 (Squamous-like), C6 (LUAD-Enriched), C8 (PAAD/some STAD), C9(UCS), C10 (BRCA/Basal), C12 (UCEC), C14 (PRAD), C15 (CESC (subset ofcervical)), C16 (BLCA), C17 (TGCT), C19 (COAD/READ), C20 (SARC/MESO),C21 (KIRK/KICH/KIRP), C22 (Liver), C24 (BRCA/Luminal), C25 (THYM), C26(SKCM/UVM) and/or C28 (THCA) subtype has elevated levels of immuneactivation (e.g., innate immunity and/or adaptive immunity) as comparedto other blaCOCA subtypes. In some cases, the C1 (ACC/PCPG), C2(GBM/LGG), C3 (OV), C4 (Squamous-like), C6 (LUAD-Enriched), C8(PAAD/some STAD), C9 (UCS), C10 (BRCA/Basal), C12 (UCEC), C14 (PRAD),C15 (CESC (subset of cervical)), C16 (BLCA), C17 (TGCT), C19(COAD/READ), C20 (SARC/MESO), C21 (KIRK/KICH/KIRP), C22 (Liver), C24(BRCA/Luminal), C25 (THYM), C26 (SKCM/UVM) and/or C28 (THCA) subtype hasreduced levels of immune activation (e.g., innate immunity and/oradaptive immunity) as compared to other COCA subtypes. In oneembodiment, COCA subtypes with low levels of or no immune activation(e.g., innate immunity and/or adaptive immunity) are not selected fortreatment with one or more immunotherapeutic agents described herein.

Angiogenesis Inhibitors

In one embodiment, upon determining a patient's or subject's COCAsubtype alone or in combination with other characterization methods asdescribed herein (e.g., determining tissue of origin cancer subtype,proliferation signature or score, immune subtype and/or TMB status,etc.), the patient is selected for drug therapy with an angiogenesisinhibitor.

In one embodiment, the angiogenesis inhibitor is a vascular endothelialgrowth factor (VEGF) inhibitor, a VEGF receptor inhibitor, a plateletderived growth factor (PDGF) inhibitor or a PDGF receptor inhibitor.

In general, methods of determining whether a patient is likely torespond to angiogenesis inhibitor therapy, or methods of selecting apatient for angiogenesis inhibitor therapy are provided herein. In oneembodiment, the method comprises determining a COCA subtype alone or incombination with other characterization methods as described herein(e.g., determining tissue of origin cancer subtype, proliferationsignature or score, immune subtype and/or TMB status, etc.) and probinga sample from the patient for the levels of at least five hypoxiabiomarkers selected from the group consisting of RRAGD, FABP5, UCHL1,GAL, PLOD, DDIT4, VEGF, ADM, ANGPTL4, NDRG1, NP, SLC16A3, and C14ORF58(see Table A) at the nucleic acid level. In a further embodiment, theprobing step comprises mixing the sample with five or moreoligonucleotides that are substantially complementary to portions ofnucleic acid molecules of the at least five biomarkers under conditionssuitable for hybridization of the five or more oligonucleotides to theircomplements or substantial complements, detecting whether hybridizationoccurs between the five or more oligonucleotides to their complements orsubstantial complements; and obtaining hybridization values of thesample based on the detecting steps. The hybridization values of thesample are then compared to reference hybridization value(s) from atleast one sample training set, wherein the at least one sample trainingset comprises (i) hybridization value(s) of the at least five biomarkersfrom a sample that overexpresses the at least five biomarkers, oroverexpresses a subset of the at least five biomarkers, (ii)hybridization values of the at least five biomarkers from a referencecancer of COCA subtype specific sample, or (iii) hybridization values ofthe at least five biomarkers from a control or healthy sample. Adetermination of whether the patient is likely to respond toangiogenesis inhibitor therapy, or a selection of the patient forangiogenesis inhibitor is then made based upon (i) the patient's COCAsubtype alone or in combination with other characterization methods asdescribed herein (e.g., determining tissue of origin cancer subtype,proliferation signature or score, immune subtype and/or TMB status,etc.) and (ii) the results of comparison.

TABLE A Biomarkers for hypoxia profile GenBank Name AbbreviationAccession No. RRAGD Ras-related GTP BC003088 binding D FABP5 fatty acidbinding M94856 protein 5 UCHL1 ubiquitin carboxyl- NM_004181 terminalesterase L1 GAL Galanin BC030241 PLOD procollagen-lysine, M982522-oxoglutarate 5- dioxygenase lysine hydroxylase DDIT4DNA-damage-inducible NM_019058 transcript 4 VEGF vascular endothelialM32977 growth factor ADM Adrenomedullin NM_001124 ANGPTL4angiopoietin-like 4 AF202636 NDRG1 N-myc downstream NM_006096 regulatedgene 1 NP nucleoside phosphorylase NM 000270 SLC16A3 solute carrierfamily NM_004207 16 monocarboxylic acid transporters, member 3 C14ORF58chromosome 14 open AK000378 reading frame 58

The aforementioned set of thirteen biomarkers, or a subset thereof, isalso referred to herein as a “hypoxia profile”.

In one embodiment, the method provided herein includes determining thelevels of at least five biomarkers, at least six biomarkers, at leastseven biomarkers, at least eight biomarkers, at least nine biomarkers,or at least ten biomarkers, or five to thirteen, six to thirteen, sevento thirteen, eight to thirteen, nine to thirteen or ten to thirteenbiomarkers selected from RRAGD, FABP5, UCHL1, GAL, PLOD, DDIT4, VEGF,ADM, ANGPTL4, NDRG1, NP, SLC16A3, and C14ORF58 in a sample obtained froma subject. Biomarker expression in some instances may be normalizedagainst the expression levels of all RNA transcripts or their expressionproducts in the sample, or against a reference set of RNA transcripts ortheir expression products. The reference set as explained throughout,may be an actual sample that is tested in parallel with the sample, ormay be a reference set of values from a database or stored dataset.Levels of expression, in one embodiment, are reported in number ofcopies, relative fluorescence value or detected fluorescence value. Thelevel of expression of the biomarkers of the hypoxia profile togetherwith the COCA subtype alone or in combination with othercharacterization methods as described herein (e.g., determining tissueof origin cancer subtype, proliferation signature or score, immunesubtype and/or TMB status, etc.) as determined using the methodsprovided herein can be used in the methods described herein to determinewhether a patient is likely to respond to angiogenesis inhibitortherapy.

In one embodiment, the levels of expression of the thirteen biomarkers(or subsets thereof, as described above, e.g., five or more, from aboutfive to about 13), are normalized against the expression levels of allRNA transcripts or their non-natural cDNA expression products, orprotein products in the sample, or of a reference set of RNA transcriptsor a reference set of their non-natural cDNA expression products, or areference set of their protein products in the sample.

In one embodiment, angiogenesis inhibitor treatments include, but arenot limited to an integrin antagonist, a selectin antagonist, anadhesion molecule antagonist, an antagonist of intercellular adhesionmolecule (ICAM)-1, ICAM-2, ICAM-3, platelet endothelial adhesionmolecule (PCAM), vascular cell adhesion molecule (VCAM)), lymphocytefunction-associated antigen 1 (LFA-1), a basic fibroblast growth factorantagonist, a vascular endothelial growth factor (VEGF) modulator, aplatelet derived growth factor (PDGF) modulator (e.g., a PDGFantagonist).

In one embodiment of determining whether a subject is likely to respondto an integrin antagonist, the integrin antagonist is a small moleculeintegrin antagonist, for example, an antagonist described by Paolillo etal. (Mini Rev Med Chem, 2009, volume 12, pp. 1439-1446, incorporated byreference in its entirety), or a leukocyte adhesion-inducing cytokine orgrowth factor antagonist (e.g., tumor necrosis factor-α (TNF-α),interleukin-1β (IL-1β), monocyte chemotactic protein-1 (MCP-1) and avascular endothelial growth factor (VEGF)), as described in U.S. Pat.No. 6,524,581, incorporated by reference in its entirety herein.

The methods provided herein are also useful for determining whether asubject is likely to respond to one or more of the followingangiogenesis inhibitors: interferon gamma 1β, interferon gamma 1β(Actimmune®) with pirfenidone, ACUHTR028, αVβ5, aminobenzoate potassium,amyloid P, ANG1122, ANG1170, ANG3062, ANG3281, ANG3298, ANG4011,anti-CTGF RNAi, Aplidin, Astragalus membranaceus extract with salvia andSchisandra chinensis, atherosclerotic plaque blocker, Azol, AZX100, BB3,connective tissue growth factor antibody, CT140, danazol, Esbriet,EXC001, EXC002, EXC003, EXC004, EXC005, F647, FG3019, Fibrocorin,Follistatin, FT011, a galectin-3 inhibitor, GKT137831, GMCT01, GMCT02,GRMD01, GRMD02, GRN510, Heberon Alfa R, interferon α-2β, ITMN520,JKB119, JKB121, JKB122, KRX168, LPA1 receptor antagonist, MGN4220, MIA2,microRNA 29a oligonucleotide, MMI0100, noscapine, PBI4050, PBI4419,PDGFR inhibitor, PF-06473871, PGN0052, Pirespa, Pirfenex, pirfenidone,plitidepsin, PRM151, Px102, PYN17, PYN22 with PYN17, Relivergen, rhPTX2fusion protein, RXI109, secretin, STX100, TGF-β Inhibitor, transforminggrowth factor, β-receptor 2 oligonucleotide, VA999260, XV615 or acombination thereof.

In another embodiment, a method is provided for determining whether asubject is likely to respond to one or more endogenous angiogenesisinhibitors. In a further embodiment, the endogenous angiogenesisinhibitor is endostatin, a 20 kDa C-terminal fragment derived from typeXVIII collagen, angiostatin (a 38 kDa fragment of plasmin), a member ofthe thrombospondin (TSP) family of proteins. In a further embodiment,the angiogenesis inhibitor is a TSP-1, TSP-2, TSP-3, TSP-4 and TSP-5.Methods for determining the likelihood of response to one or more of thefollowing angiogenesis inhibitors are also provided a soluble VEGFreceptor, e.g., soluble VEGFR-1 and neuropilin 1 (NPR1), angiopoietin-1,angiopoietin-2, vasostatin, calreticulin, platelet factor-4, a tissueinhibitor of metalloproteinase (TIMP) (e.g., TIMP1, TIMP2, TIMP3,TIMP4), cartilage-derived angiogenesis inhibitor (e.g., peptide troponinI and chrondomodulin I), a disintegrin and metalloproteinase withthrombospondin motif 1, an interferon (IFN), (e.g., IFN-α, IFN-β,IFN-γ), a chemokine, e.g., a chemokine having the C—X—C motif (e.g.,CXCL10, also known as interferon gamma-induced protein 10 or smallinducible cytokine B10), an interleukin cytokine (e.g., IL-4, IL-12,IL-18), prothrombin, antithrombin III fragment, prolactin, the proteinencoded by the TNFSF15 gene, osteopontin, maspin, canstatin,proliferin-related protein.

In one embodiment, a method for determining the likelihood of responseto one or more of the following angiogenesis inhibitors is provided isangiopoietin-1, angiopoietin-2, angiostatin, endostatin, vasostatin,thrombospondin, calreticulin, platelet factor-4, TIMP, CDAI, interferonα, interferon β, vascular endothelial growth factor inhibitor (VEGI)meth-1, meth-2, prolactin, VEGI, SPARC, osteopontin, maspin, canstatin,proliferin-related protein (PRP), restin, TSP-1, TSP-2, interferon gamma1β, ACUHTR028, αVβ5, aminobenzoate potassium, amyloid P, ANG1122,ANG1170, ANG3062, ANG3281, ANG3298, ANG4011, anti-CTGF RNAi, Aplidin,Astragalus membranaceus extract with salvia and Schisandra chinensis,atherosclerotic plaque blocker, Azol, AZX100, BB3, connective tissuegrowth factor antibody, CT140, danazol, Esbriet, EXC001, EXC002, EXC003,EXC004, EXC005, F647, FG3019, Fibrocorin, Follistatin, FT011, agalectin-3 inhibitor, GKT137831, GMCT01, GMCT02, GRMD01, GRMD02, GRN510,Heberon Alfa R, interferon α-213, ITMN520, JKB119, JKB121, JKB122,KRX168, LPA1 receptor antagonist, MGN4220, MIA2, microRNA 29aoligonucleotide, MMI0100, noscapine, PBI4050, PBI4419, PDGFR inhibitor,PF-06473871, PGN0052, Pirespa, Pirfenex, pirfenidone, plitidepsin,PRM151, Px102, PYN17, PYN22 with PYN17, Relivergen, rhPTX2 fusionprotein, RXI109, secretin, STX100, TGF-β Inhibitor, transforming growthfactor, β-receptor 2 oligonucleotide, VA999260, XV615 or a combinationthereof.

In yet another embodiment, the angiogenesis inhibitor can includepazopanib (Votrient), sunitinib (Sutent), sorafenib (Nexavar), axitinib(Inlyta), ponatinib (Iclusig), vandetanib (Caprelsa), cabozantinib(Cometrig), ramucirumab (Cyramza), regorafenib (Stivarga),ziv-aflibercept (Zaltrap), motesanib, or a combination thereof. Inanother embodiment, the angiogenesis inhibitor is a VEGF inhibitor. In afurther embodiment, the VEGF inhibitor is axitinib, cabozantinib,aflibercept, brivanib, tivozanib, ramucirumab or motesanib. In yet afurther embodiment, the angiogenesis inhibitor is motesanib.

In one embodiment, the methods provided herein relate to determining asubject's likelihood of response to an antagonist of a member of theplatelet derived growth factor (PDGF) family, for example, a drug thatinhibits, reduces or modulates the signaling and/or activity ofPDGF-receptors (PDGFR). For example, the PDGF antagonist, in oneembodiment, is an anti-PDGF aptamer, an anti-PDGF antibody or fragmentthereof, an anti-PDGFR antibody or fragment thereof, or a small moleculeantagonist. In one embodiment, the PDGF antagonist is an antagonist ofthe PDGFR-α or PDGFR-β. In one embodiment, the PDGF antagonist is theanti-PDGF-β aptamer E10030, sunitinib, axitinib, sorefenib, imatinib,imatinib mesylate, nintedanib, pazopanib HCl, ponatinib, MK-2461,dovitinib, pazopanib, crenolanib, PP-121, telatinib, imatinib, KRN 633,CP 673451, TSU-68, Ki8751, amuvatinib, tivozanib, masitinib, motesanibdiphosphate, dovitinib dilactic acid, linifanib (ABT-869).

Upon making a determination of whether a patient is likely to respond toangiogenesis inhibitor therapy, or selecting a patient for angiogenesisinhibitor therapy, in one embodiment, the patient is administered theangiogenesis inhibitor. The angiogenesis in inhibitor can be any of theangiogenesis inhibitors described herein.

Radiotherapy

In one embodiment, provided herein is a method for determining whether apatient is likely to respond to radiotherapy by determining the COCAsubtype alone or in combination with other characterization methods asdescribed herein (e.g., determining tissue of origin cancer subtype,proliferation signature or score, immune subtype and/or TMB status,etc.) of a sample obtained from the patient and, based on the COCAsubtype alone or in combination with other characterization methods asdescribed herein (e.g., tissue of origin cancer subtype, proliferationsignature or score, immune subtype and/or TMB status, etc.), assessingwhether the patient is likely to respond to or benefit fromradiotherapy. In another embodiment, provided herein is a method ofselecting a patient suffering from cancer for radiotherapy bydetermining a COCA subtype alone or in combination with othercharacterization methods as described herein (e.g., determining tissueof origin cancer subtype, proliferation signature or score, immunesubtype and/or TMB status, etc.) of a sample from the patient and, basedon the COCA subtype alone or in combination with other characterizationmethods as described herein (e.g., determining tissue of origin cancersubtype, proliferation signature or score, immune subtype and/or TMBstatus, etc.), selecting the patient for radiotherapy.

In some embodiments, the radiotherapy can include but are not limited toproton therapy and external-beam radiation therapy. In some embodiments,the radiotherapy can include any types or forms of treatment that issuitable for patients with specific types of cancer.

In some embodiments, a patient with a specific type of cancer can haveor display resistance to radiotherapy. Radiotherapy resistance in anycancer or subtype thereof can be determined by measuring or detectingthe expression levels of one or more genes known in the art and/orprovided herein associated with or related to the presence ofradiotherapy resistance. Genes associated with radiotherapy resistancecan include NFE2L2, KEAP1 and CUL3. In some embodiments, radiotherapyresistance can be associated with the alterations of KEAP1 (Kelch-likeECH-associated protein 1)/NRF2 (nuclear factor E2-related factor 2)pathway. Association of a particular gene to radiotherapy resistance canbe determined by examining expression of said gene in one or morepatients known to be radiotherapy non-responders and comparingexpression of said gene in one or more patients known to be radiotherapyresponders.

Surgical Intervention

In one embodiment, provided herein is a method for determining whether acancer patient is likely to respond to surgical intervention bydetermining the COCA subtype alone or in combination with othercharacterization methods as described herein (e.g., determining tissueof origin cancer subtype, proliferation signature or score, immunesubtype and/or TMB status, etc.) of a sample obtained from the patientand, based on the COCA subtype alone or in combination with othercharacterization methods as described herein (e.g., determining tissueof origin cancer subtype, proliferation signature or score, immunesubtype and/or TMB status, etc.), assessing whether the patient islikely to respond to or benefit from surgery. In another embodiment,provided herein is a method of selecting a patient suffering from cancerfor surgery by determining a COCA subtype alone or in combination withother characterization methods as described herein (e.g., determiningtissue of origin cancer subtype, proliferation signature or score,immune subtype and/or TMB status, etc.) of a sample from the patientand, based on the COCA subtype alone or in combination with othercharacterization methods as described herein (e.g., determining tissueof origin cancer subtype, proliferation signature or score, immunesubtype and/or TMB status, etc.), selecting the patient for surgery. Insome embodiments, the surgery can include laser technology, excision,dissection, and reconstructive surgery.

Prediction of Overall Survival Rate and Metastasis for Cancer Patients

The present disclosure provides methods for predicting overall survivalrate for a cancer patient. In some embodiments, the prediction ofoverall survival rate can involve obtaining a tumor sample for a cancerpatient. In some embodiments, the cancer patients can have variousstages of cancers. In some embodiments, the overall survival rate can bedetermined by detecting the expression level of at least one subtypeclassifier of a publically available pan-cancer database or dataset. Insome embodiments, an overall survival rate can be determined bydetecting the expression level (e.g., protein and/or nucleic acid) ofany subtype classifiers that are relevant across many types of cancer,for example, subtype classifiers relevant to cell of origin. In oneembodiment, the subtype classifiers can be all or a subset ofclassifiers from Table 1. In some embodiments, the identification of thecell of origin (COCA) subtype is indicative of the overall survival inthe patient. In some embodiments, the COCA subtype is selected from C1ACC/PCPG, C2 GBM/LGG, C3 OV, C4 Squamous-like, C6 LUAD-Enriched, C8PAAD/some STAD, C9 UCS, C10 BRCA/Basal, C12 UCEC, C14 PRAD, C15 CESC(subset of cervical), C16 BLCA, C17 TGCT, C19 COAD/READ, C20 SARC/MESO,C21 KIRK/KICH/KIRP, C22 Liver, C24 BRCA/Luminal, C25 THYM, C26 SKCM/UVMand C28 THCA.

The present disclosure provides methods for predicting nodal metastasisfor a cancer patient. In some embodiments, the prediction of nodalmetastasis can involve obtaining a tumor sample for a patient. In someembodiments, the patients can have various stages of cancers. In someembodiments, the nodal metastasis can be determined by detecting theexpression level of at least one subtype classifier from a pan-cancergene set. The pan-cancer gene set can be a publically availablepan-cancer database or a gene set provided herein (e.g. Table 1) or acombination thereof. The publically available pan-cancer gene set can bea TCGA pan-cancer gene set. In one embodiment, nodal metastasis ofcancer can be determined by detecting the expression level of all thesubtype classifiers or subsets thereof of the classifiers found in Table1.

In some embodiments, the C1 ACC/PCPG, C2 GBM/LGG, C3 OV, C4Squamous-like, C6 LUAD-Enriched, C8 PAAD/some STAD, C9 UCS, C10BRCA/Basal, C12 UCEC, C14 PRAD, C15 CESC (subset of cervical), C16 BLCA,C17 TGCT, C19 COAD/READ, C20 SARC/MESO, C21 KIRK/KICH/KIRP, C22 Liver,C24 BRCA/Luminal, C25 THYM, C26 SKCM/UVM or C28 THCA COCA subtype can bemore likely to be associated with nodal metastasis compared with othersubtypes. In some embodiments, the C1 ACC/PCPG, C2 GBM/LGG, C3 OV, C4Squamous-like, C6 LUAD-Enriched, C8 PAAD/some STAD, C9 UCS, C10BRCA/Basal, C12 UCEC, C14 PRAD, C15 CESC (subset of cervical), C16 BLCA,C17 TGCT, C19 COAD/READ, C20 SARC/MESO, C21 KIRK/KICH/KIRP, C22 Liver,C24 BRCA/Luminal, C25 THYM, C26 SKCM/UVM or C28 THCA COCA subtype can bemost likely associated with positive lymph node metastasis compared withother subtypes. In some embodiments, the C1 ACC/PCPG, C2 GBM/LGG, C3 OV,C4 Squamous-like, C6 LUAD-Enriched, C8 PAAD/some STAD, C9 UCS, C10BRCA/Basal, C12 UCEC, C14 PRAD, C15 CESC (subset of cervical), C16 BLCA,C17 TGCT, C19 COAD/READ, C20 SARC/MESO, C21 KIRK/KICH/KIRP, C22 Liver,C24 BRCA/Luminal, C25 THYM, C26 SKCM/UVM or C28 THCA COCA subtype can beat least about 0.1 times, at least about 0.2 times, at least about 0.3times, at least about 0.4 times, at least about 0.5 times, at leastabout 0.6 times, at least about 0.7 times, at least about 0.8 times, atleast about 0.9 times, at least about 1 time, at least about 1.2 times,at least about 1.5 times, at least about 1.7 times, at least about 2.0times, at least about 2.2 times, at least about 2.5 times, at leastabout 2.7 times, at least about 3.0 times, at least about 3.2 times, atleast about 3.5 times, at least about 3.7 times, at least about 4.0times, at least about 4.2 times, at least about 4.5 times, at leastabout 4.7 times, at least about 5.0 times, inclusive of all ranges andsubranges therebetween, more likely to have occult nodal metastasiscompared to other COCA subtypes.

Detection Methods

In one embodiment, the methods and compositions provided herein allowfor the detection of at least one biomarker in a tumor sample obtainedfrom a subject. The at least one biomarker can be a classifier biomarkerprovided herein. The detection can be at the nucleic acid level orprotein level. In one embodiment, the detection is at the nucleic acidlevel and the detection can be by using any amplification, hybridizationand/or sequencing assay disclosed herein. In one embodiment, the atleast one biomarker detected using the methods and compositions providedherein is selected from Table 1. Further to the above embodiment, thedetection of the at least one biomarker selected from Table 1 is at thenucleic acid level. In one embodiment, the methods of detecting thebiomarker(s) (e.g., classifier biomarkers) in the tumor sample obtainedfrom the subject comprises, consists essentially of, or consists ofmeasuring the expression level of at least one or a plurality ofbiomarkers using any of the methods provided herein. The biomarkers canbe selected from Table 1. In one embodiment, the plurality of biomarkernucleic acids comprises, consists essentially of or consists of at least4 biomarkers, at least 8 biomarkers, at least 12 biomarkers, at least 16biomarkers, at least 20 biomarkers, at least 24 biomarkers, at least 28biomarkers, at least 32 biomarkers, at least 36 biomarkers, at least 40biomarkers, at least 44 biomarkers, at least 48 biomarkers, at least 52biomarkers, at least 56 biomarkers, at least 60 biomarkers, at least 64biomarkers, at least 68 biomarkers, at least 72 biomarkers, at least 76biomarkers, at least 80 biomarkers or all 84 biomarkers of Table 1. Inanother embodiment, the plurality of biomarkers comprises, consistsessentially of or consists of at least 8 biomarkers, at least 16biomarkers, at least 24 biomarkers, at least 32 biomarkers, at least 40biomarkers, at least 48 biomarkers, at least 56 biomarkers, at least 64biomarkers, at least 72 biomarkers, at least 80 biomarkers or all 84biomarkers of Table 1.

In another embodiment, the methods and compositions provided hereinallow for the detection of at least one or a plurality of biomarkersselected from the biomarkers listed in Table 1 in combination with thedetection of at least one or a plurality of biomarkers from one or moreadditional sets of biomarkers in a tumor sample obtained from a subject.The tumor sample can be any type of sample provided herein. The subjectcan be suffering from or suspected of suffering from cancer. The cancercan be any type of cancer provided herein. The detection can be at thenucleic acid level or protein level. In one embodiment, the detection isat the nucleic acid level and the detection can be by using anyamplification, hybridization and/or sequencing assay disclosed herein.The one or more additional sets of biomarkers can be selected from a setof biomarkers whose presence, absence and/or level of expression isindicative of immune activation, proliferation, a tissue of origincancer subtype, or any combination thereof. The additional set ofbiomarkers for indicating immune activation can be gene expressionsignatures of and/or Adaptive Immune Cells (AIC) and/or Innate immuneCells (IIC), individual immune biomarkers, interferon genes, majorhistocompatibility complex, class II (MHC II) genes or a combinationthereof. The gene expression signatures of both IIC and AIC can be anygene signatures known in the art such as, for example, the genesignatures listed in Thorsson, V. et al., 2018, The immune landscape ofcancer. Immunity, 48(4), pp. 812-830, Bindea et al. (Immunity 2013;39(4); 782-795), Faruki H. et al., JTO, 12(6): 943-953 (2017),Charoentong P. et al., Cell reports, 18, 248-262 (2017) or WO2017/201165and WO2017/201164, each of which is herein incorporated by reference intheir entirety. The additional set of biomarkers for indicatingproliferation can be gene expression signatures that include the 11 genesignature comprising BIRC5, CCNB1, CDC20, CDCA1, CEP55, KNTC2, MKI67,PTTG1, RRM2, TYMS, and UBE2C found in Martin M. et al., Breast CancerRes Treat, 138: 457-466 (2013), the 18 gene signature found in US20160115551 and/or the 26 gene signature found in 62/789,668 filed Jan.8, 2019. The additional set of biomarkers for determining tissue oforigin cancer subtypes can be any gene signature found in the art forsubtyping specific tissue of origin cancers. In one embodiment, theadditional set of biomarkers for determining tissue of origin cancersubtypes is the adenocarcinoma lung cancer subtyping gene expressionsignatures found in WO2017/201165, US20170114416 or U.S. Pat. No.8,822,153. In one embodiment, the additional set of biomarkers fordetermining tissue of origin cancer subtypes is the squamous cellcarcinoma lung cancer subtyping gene expression signatures found inWO2017/201164, US20170114416 or U.S. Pat. No. 8,822,153. In oneembodiment, the additional set of biomarkers for determining tissue oforigin cancer subtypes is the breast cancer subtyping gene expressionsignatures found in Parker J S et al., (2009) Supervised risk predictorof breast cancer based on intrinsic subtypes. J Clin Oncol 27:1160-1167,which is herein incorporated by reference in its entirety. In oneembodiment, the additional set of biomarkers for determining tissue oforigin cancer subtypes is the bladder cancer subtyping gene expressionsignatures found in 62/629,975 filed Feb. 13, 2018. In one embodiment,the additional set of biomarkers for determining tissue of origin cancersubtypes is the bladder cancer subtyping gene expression signaturesfound in The Cancer Genome Atlas Research Network. Comprehensivemolecular characterization of urothelial bladder carcinoma. Naturevolume 507, pages 315-322 (2014), or Robertson, A G, et al., Cell,171(3): 540-556 (2017), each of which is herein incorporated byreference. In one embodiment, the additional set of biomarkers fordetermining tissue of origin cancer subtypes is a head and neck squamouscell carcinoma (HNSCC) subtyping gene expression signatures selectedfrom PCT/US18/45522 or PCT/US18/48862. Further to any of the aboveembodiments, the methods and compositions provided herein furthercomprise determining tumor mutation burden (TMB) and/or TMB rate of thetumor sample. The TMB and/or TMB rate can be determined or calculatedusing any method known in the art. In one embodiment, the TMB and/or TMBrate is determined from RNA as described in 62/743,257 filed on Oct. 9,2018 and 62/771,702 filed on Nov. 27, 2018.

Kits

Kits for practicing the methods provided herein can be further provided.By “kit” can encompass any manufacture (e.g., a package or a container)comprising at least one reagent, e.g., an antibody, a nucleic acid probeor primer, etc., for specifically detecting the expression of abiomarker provided herein. The kit may be promoted, distributed, or soldas a unit for performing the methods provided herein. Additionally, thekits may contain a package insert describing the kit and methods for itsuse.

In one embodiment, kits for practicing the methods provided herein areprovided. Such kits are compatible with both manual and automatedimmunocytochemistry techniques (e.g., cell staining). These kitscomprise at least one antibody directed to a biomarker of interest,chemicals for the detection of antibody binding to the biomarker, acounterstain, and, optionally, a bluing agent to facilitateidentification of positive staining cells. Any chemicals that detectantigen-antibody binding may be used in the practice of the methodsprovided herein. The kits may comprise at least 2, at least 3, at least4, at least 5, at least 6, at least 7, at least 8, at least 9, at least10, or more antibodies for use in the methods provided herein.

In one embodiment, the kits for practicing the methods provided hereincomprise at least one primer pair directed to a biomarker of interest,chemicals for the detection of amplification of the biomarker ofinterest, and, optionally, any agent necessary for quantifying thedetection level of the biomarker of interest. Any chemicals that detectamplification products may be used in the practice of the methodsprovided herein. The kits may comprise at least 2, at least 3, at least4, at least 5, at least 6, at least 7, at least 8, at least 9, at least10, or more primer pairs for use in the methods provided herein.

In one embodiment, the kits for practicing the methods provided hereincomprise at least one probe directed to a biomarker of interest,chemicals for the detection of hybridization of the probe to thebiomarker of interest, and, optionally, any agent necessary forquantifying the level of the biomarker of interest. Any chemicals thatdetect hybridization products may be used in the practice of the methodsprovided herein. The kits may comprise at least 2, at least 3, at least4, at least 5, at least 6, at least 7, at least 8, at least 9, at least10, or more probes for use in the methods provided herein.

EXAMPLES

The present invention is further illustrated by reference to thefollowing Examples. However, it should be noted that these Examples,like the embodiments described above, are illustrative and are not to beconstrued as restricting the scope of the invention in any way.

Example 1—Development and Validation of the 84-Gene Pan Cancer SubtypingSignature Background

Recent genomic analyses of pathologically-defined tumor types hasidentified disease subtypes within a tissue. The extent to which genomicsignatures are shared across tumorous tissues remains unclear.

Provided within this example is the development and validation of an84-gene gene signature that can be used in a method for classifying atumor sample obtained from a patient as one of 21 possible integrated,pan-cancer cluster of cluster assignment (COCA) subtypes, therebyproviding valuable insight into tumor biology and potential therapeuticresponse. The 21 COCA subtypes that can be determined using the genesignature developed herein alone are listed in FIG. 1 and are designatedas C1 (ACC/PCPG), C2 (GBM/LGG), C3 (OV), C4 (Squamous-like), C6(LUAD-Enriched), C8 (PAAD/some STAD), C9 (UCS), C10 (BRCA/Basal), C12(UCEC), C14 (PRAD), C15 (CESC (subset of cervical)), C16 (BLCA), C17(TGCT), C19 (COAD/READ), C20 (SARC/MESO), C21 (KIRK/KICH/KIRP), C22(Liver), C24 (BRCA/Luminal), C25 (THYM), C26 (SKCM/UVM) and C28 (THCA)).

Objective

This example was initiated to address the need for an efficient methodfor improved tumor classification based on cell-of-origin that couldinform prognosis, drug response and patient management based onunderlying genomic and biologic tumor characteristics. Using the dataassociated with the 2018 TCGA Pan-cancer publications(https://gdc.cancer.gov/about-data/publications/pancanatlas) andcomparing to the multi-platform cluster of cluster assignment (COCA)analysis performed in Hoadley et al, Cell. 2018 Apr. 5; 173(2):291-304(hereinafter referred to as the “Gold Standard” for COCA subtyping) apan-cancer COCA subtyping signature was developed. The gene signaturedeveloped in this example can be used in diagnostic methods that includeevaluation of gene expression subtypes and application of an algorithmfor categorization of a tumor sample obtained from a subject into one of21 COCA subtypes C1 (ACC/PCPG), C2 (GBM/LGG), C3 (OV), C4(Squamous-like), C6 (LUAD-Enriched), C8 (PAAD/some STAD), C9 (UCS), C10(BRCA/Basal), C12 (UCEC), C14 (PRAD), C15 (CESC (subset of cervical)),C16 (BLCA), C17 (TGCT), C19 (COAD/READ), C20 (SARC/MESO), C21(KIRK/KICH/KIRP), C22 (Liver), C24 (BRCA/Luminal), C25 (THYM), C26(SKCM/UVM) and C28 (THCA))).

Methods/Results

To develop the aforementioned pan-cancer, COCA subtyper, data associatedwith the 2018 TCGA Pan-cancer publications(https://gdc.cancer.gov/about-data/publications/pancanatlas) wasdownloaded. In particular, the expression data from primary solid tumorsamples (n=8545; primary solid tumor per TCGA barcode) that hadexpression data from the “EBPlusPlusAdjustPANCAN_IlluminaHiSeq_RNASeqV2”platform (i.e.,EBPlusPlusAdjustPANCAN_IlluminaHiSeq_RNASeqV@-v2.geneExp.tsv) from theTCGA dataset was used, as were the merged sample quality annotations(i.e., merged_sample_quality_annotations.tsv). Data from“do_not_use=False” specified in the sample quality file(merged_sample_quality_annotations.tsv) as well as data from samplesfrom the pilot study (designated tumor type=“FFFP”) were excluded. The8545 samples were from 32 tumor types. The 32 tumor types were kidneyrenal papillary cell carcinoma (KIRP); breast invasive carcinoma (BRCA);thyroid cancer (THCA); bladder urothelial carcinoma (BLCA); prostateadenocarcinoma (PRAD); kidney chromophobe (KICH); cervical squamous cellcarcinoma and endocervical adenocarcinoma (CESC); kidney renal clearcell carcinoma (KIRC); liver hepatocellular carcinoma (LIHC); low gradeglioma (LGG); sarcoma (SARC); lung adenocarcinoma (LUAD); colonadenocarcinoma (COAD); head and neck squamous cell carcinoma (HNSC);uterine corpus endometrial carcinoma (UCEC); glioblastoma multiforme(GBM); esophageal carcinoma (ESCA); stomach adenocarcinoma (STAD);ovarian serous cystadenocarcinoma (OV); rectum adenocarcinoma (READ);adrenocortical carcinoma (ACC); uveal melanoma (UVM); mesothelioma(MESO); pheochromocytoma and paraganglioma (PCPG); skin cutaneousmelanoma (SKCM); uterine carcinsarcoma (UCS); lung squamous cellcarcinoma (LUSC); testicular germ cell tumors (TGCT); cholangiocarcinoma(CHOL); pancreatic adenocarcinoma (PAAD); thymoma (THYM); and LymphoidNeoplasm Diffuse Large B-cell Lymphoma (DLBC).

The COCA subtypes (i.e., COCA_Sample_Assignment_n9759.csv) from Hoadleyet al, (Cell. 2018 Apr. 5; 173(2):291-304) were then assigned to the8545 samples from the TCGA data described above, excluding COCA subtypeswith 30 or fewer samples. FIG. 1 shows the cross-tabulation of the TCGAtumor type and COCA subtype from the Hoadley et al, 2018 paper forsamples with qualifying expression data as described herein. FIG. 1 alsoprovides the integrated COCA subtypes and their designations as providedherein.

To develop the reduced and clinically applicable pan cancer COCAsubtyper, the 8545 samples from the TCGA dataset described above (andthe RNA-seq expression data associated therewith) were divided into atraining set (⅔ of the data set; n=5696) and a test set (⅓ of the dataset; n=2849), balancing for uniform tumor type of origin distributions(see the Table in FIG. 2). Gene expression values were log 2 transformedand genes with low variance and/or low mean were filtered out, whilegenes with mean variance and mean expression values greater than 4 werekept resulting in gene expression data for 2190 genes (see graph in FIG.2). It should be noted that samples that were found to have a COCAsubtype 5 (C5; n=41) using the gold standard COCA subtyper described inHoadley et al, 2018 were excluded from the training set due to thepresence of a small number of samples that were not well differentiatedby gene expression. As a result, the training set subsequently used togenerate the COCA subtyper via cross-validation and classification tothe nearest centroid (ClaNC (Dabney, 2006)) had an n of 5655 samples.

As mentioned, a Classification to Nearest Centroid (CLaNC) algorithm(see Alan R. Dabney; ClaNC: point-and-click software for classifyingmicroarrays to nearest centroids, Bioinformatics, Volume 22, Issue 1, 1Jan. 2006, Pages 122-123) was applied to the gene expression data fromthe training set (n=5655) in order to choose different numbers of genesper subtype (see. FIG. 3) that were subsequently tested using 5-foldcross-validation (CV) to find the minimum number of genes that would berequired to provide differentiation of the aforementioned COCA subtypeswith sufficient agreement with the previously developed gold standard(i.e., COCA analysis on multiplatform ‘omic’ data as described inHoadley et al, 2018). As shown in FIG. 3, said 5-fold cross validationsuggested that 4 genes per subtype for a total of 84 genes (i.e., forthe 21 COCA subtypes described herein) would achieve sufficientagreement between the classifier prediction and COCA subtype asdetermined using the gold standard method from Hoadley et al. 2018.

Regarding selection of the final 84 genes (i.e., 4 genes/COCA subtype)to be included in the 21 class COCA subtyper, the ClaNC software package(see Dabney, 2006) used on the entire training set calculatedt-statistics and 84 genes were selected based on the ranks of thestrongest t-statistics (i.e., both negatively and positively correlatedgenes for each COCA subtype can be and were selected) (see Table 1).Then an ordinary nearest centroid classifier was fit using the 21 COCAclasses and 84 genes.

Validation of the reduced gene signature was performed by applying the84-gene nearest centroid classifier of Table 1 to the test set (n=2849)and comparing the COCA subtypes as determined by the gold standard vs.the 84-gene classifier or signature (i.e., Table 1). As shown in FIG. 4,the test set showed an overall agreement of 90%, which was similar tothe agreement with COCA GS subtyping of 91% for the training set. FIG. 5showed that the 84 gene nearest centroid classifier called a vastmajority of the COCA subtypes in the test set correctly.

Conclusion

Development and validation of an 84-gene signature for COCA subtypingwas described. The resulting 84 gene signature maintains highconcordance rates with the gold standard COCA subtyper as described inthe art.

Subtypes provide potential biomarkers for targeted and immunotherapyresponse. The data demonstrate that differences in prognosis that may bemeaningful to therapeutic management.

Example 2—Examination Use of COCA Subtype Signature as a PrognosticIndicator Objective

This example describes the examination of the 84 gene COCA subtyperdeveloped in Example 1 and found in Table 1 as a prognostic indicatorfor overall survival. Overall, the goal of the studies in this examplewas to determine if the 84-gene COCA signature has prognostic valueacross a myriad of tumor types.

Methods and Results

In order to determine if the 84 gene signature of Table 1 has prognosticutility, associations between overall survival and the 84 gene COCAsignature were examined within specific tumor types (i.e., BLCA, BRCAand STAD). Associations between overall survival and the 84 genesignature were examined separately within tumor type by fitting coxmodels adjusted for age at diagnosis and stage with overall survival theoutcome and classifier subtype as the predictor, reporting hazard ratiosfor classifier subtype, and testing (Wald's test) whether thecoefficient for classifier subtype was different from zero. It should benoted that the association tests used only subtype categories havingmany samples. For example, BLCA tumors were classified into 8 predictedsubtype categories (C10, C15, C16, C20, C25, C4, C8, C9; see FIG. 6) but92% (345/375) were in two of them (C16 and C4), and only thesecategories were analyzed.

As shown in FIGS. 6-8, specific COCA subtypes can be associated withoverall survival. For example, as shown in FIG. 6, the C4 COCA subtypewas significantly associated with worse overall survival in BLCA(association test p-value for C4 subtype as determined using Table 1gene signature was 0.0204, while the Hazard ratio was 1.53 (i.e., secondcolumn); FIG. 6), while the C8 COCA subtype in STAD (association testp-value for C8 subtype as determined using Table 1 gene signature was0.00689, while the Hazard ratio was 1.67; FIG. 8) samples was alsoassociated with worse overall survival. In contrast, the C24 COCAsubtype in the BRCA sample had better overall survival (association testp-value was 0.00013, while the Hazard ratio was 0.37; FIG. 7).

INCORPORATION BY REFERENCE

The following references are incorporated by reference in theirentireties for all purposes.

-   Hoadley, Katherine A., Christina Yau, Toshinori Hinoue, Denise M.    Wolf, Alexander J. Lazar, Esther Drill, Ronglai Shen et al.    “Cell-of-origin patterns dominate the molecular classification of    10,000 tumors from 33 types of cancer.” Cell173, no. 2 (2018):    291-304.-   Hoadley, Katherine A., Christina Yau, Denise M. Wolf, Andrew D.    Cherniack, David Tamborero, Sam Ng, Max D M Leiserson et al.    “Multiplatform analysis of 12 cancer types reveals molecular    classification within and across tissues of origin.” Cell 158, no. 4    (2014): 929-944.-   Alan R. Dabney; ClaNC: point-and-click software for classifying    microarrays to nearest centroids, Bioinformatics, Volume 22, Issue    1, 1 Jan. 2006, Pages 122-123..-   Alan R. Dabney; Classification of microarrays to nearest centroids,    Bioinformatics, Volume 21, Issue 22, 15 Nov. 2005, Pages 4148-4154.

Further Numbered Embodiments of the Disclosure

Other subject matter contemplated by the present disclosure is set outin the following numbered embodiments:

1. A method for determining a clustering of cluster assignments (COCA)subtype of a tumor cancer sample obtained from a patient, the methodcomprising detecting an expression level of at least one classifierbiomarker of Table 1, wherein the detection of the expression level ofthe classifier biomarker specifically identifies a C1, C2, C3, C4, C6,C8, C9, C10, C12, C14, C15, C16, C17, C19, C20, C21, C22, C24, C25, C26or C28 COCA subtype.

2. The method of embodiment 1, wherein the method further comprisescomparing the detected levels of expression of the at least oneclassifier biomarker of Table 1 to the expression of the at least oneclassifier biomarker of Table 1 in at least one sample training set(s),wherein the at least one sample training set(s) comprises expressiondata of the at least one classifier biomarker of Table 1 from areference C1 sample, expression data of the at least one classifierbiomarker of Table 1 from a reference C2 sample, expression data of theat least one classifier biomarker of Table 1 from a reference C3 sample,expression data of the at least one classifier biomarker of Table 1 froma reference C4 sample, expression data of the at least one classifierbiomarker of Table 1 from a reference C6 sample, expression data of theat least one classifier biomarker of Table 1 from a reference C8 sample,expression data of the at least one classifier biomarker of Table 1 froma reference C9 sample, expression data of the at least one classifierbiomarker of Table 1 from a reference C10 sample, expression data of theat least one classifier biomarker of Table 1 from a reference C12sample, expression data of the at least one classifier biomarker ofTable 1 from a reference C14 sample, expression data of the at least oneclassifier biomarker of Table 1 from a reference C15 sample, expressiondata of the at least one classifier biomarker of Table 1 from areference C16 sample, expression data of the at least one classifierbiomarker of Table 1 from a reference C17 sample, expression data of theat least one classifier biomarker of Table 1 from a reference C19sample, expression data of the at least one classifier biomarker ofTable 1 from a reference C20 sample, expression data of the at least oneclassifier biomarker of Table 1 from a reference C21 sample, expressiondata of the at least one classifier biomarker of Table 1 from areference C22 sample, expression data of the at least one classifierbiomarker of Table 1 from a reference C24 sample, expression data of theat least one classifier biomarker of Table 1 from a reference C25sample, expression data of the at least one classifier biomarker ofTable 1 from a reference C26 sample, expression data of the at least oneclassifier biomarker of Table 1 from a reference C28 sample or acombination thereof; and classifying the sample as the C1, C2, C3, C4,C6, C8, C9, C10, C12, C14, C15, C16, C17, C19, C20, C21, C22, C24, C25,C26 or C28 COCA subtype based on the results of the comparing step.

3. The method of embodiment 2, wherein the comparing step comprisesapplying a statistical algorithm which comprises determining acorrelation between the expression data obtained from the sample and theexpression data from the at least one training set(s); and classifyingthe sample as a C1, C2, C3, C4, C6, C8, C9, C10, C12, C14, C15, C16,C17, C19, C20, C21, C22, C24, C25, C26 or C28 COCA subtype based on theresults of the statistical algorithm.

4. The method of any one of embodiments 1-3, wherein the C1 COCA subtypeindicates that a tumor sample is substantially similar to or isadenocortical carcinoma; the C2 COCA subtype indicates that a tumorsample is substantially similar to or is glioblastoma; the C3 COCAsubtype indicates that a tumor sample is substantially similar to or isan ovarian serous cystadenocarcinoma (epithelial ovarian cancer); the C4COCA subtype indicates that a tumor sample is substantially similar toor is squamous cell carcinoma of the lung, the head and neck or thebladder; the C6 COCA subtype indicates that a tumor sample issubstantially similar to or is lung adenocarcinoma; the C8 COCA subtypeindicates that a tumor sample is substantially similar to or ispancreatic adenocarcinoma; the C9 COCA subtype indicates that a tumorsample is substantially similar to or is uterine carcinosarcoma; the C10COCA subtype indicates that a tumor sample is substantially similar toor is the basal subtype of breast cancer; the C12 COCA subtype indicatesthat a tumor sample is substantially similar to or is uterine corpusendometrial cancer; the C14 COCA subtype indicates that a tumor sampleis substantially similar to or is prostate cancer; the C15 COCA subtypecan indicate that a tumor sample is substantially similar to or isnon-squamous cervical cancer; the C16 COCA subtype indicates that atumor sample is substantially similar to or is a bladder urothelialcarcinoma; the C17 COCA subtype indicates that a tumor sample issubstantially similar to or is a testicular germ cell tumor; the C19COCA subtype indicates that a tumor sample is substantially similar toor is a colon, rectal, esophageal or stomach adenocarcinoma; the C20COCA subtype indicates that a tumor sample is substantially similar toor is a sarcoma; the C21 COCA subtype indicates that a tumor sample issubstantially similar to or is a kidney chromophobe, kidney renalpapillary cell carcinoma or kidney renal clear cell carcinoma; the C22COCA subtype indicates that a tumor sample is substantially similar toor is liver hepatocellular carcinoma; the C24 COCA subtype indicatesthat a tumor sample is substantially similar to or is the luminalsubtype of breast cancer; the C25 COCA subtype indicates that a tumorsample is substantially similar to or is thymoma; the C26 COCA subtypeindicates that a tumor sample is substantially similar to or ismelanoma; or the C28 COCA subtype indicates that a tumor sample issubstantially similar to or is thyroid cancer.

5. The method of any one of embodiments 1-4, wherein the expressionlevel of the classifier biomarker is detected at the nucleic acid level.

6. The method of embodiment 5, wherein the nucleic acid level is RNA orcDNA.

7. The method embodiment 5 or 6, wherein the detecting an expressionlevel comprises performing quantitative real time reverse transcriptasepolymerase chain reaction (qRT-PCR), RNAseq, microarrays, gene chips,nCounter Gene Expression Assay, Serial Analysis of Gene Expression(SAGE), Rapid Analysis of Gene Expression (RAGE), nuclease protectionassays, Northern blotting, or any other equivalent gene expressiondetection techniques.

8. The method of embodiment 7, wherein the expression level is detectedby performing RNAseq.

9. The method of embodiment 8, wherein the detection of the expressionlevel comprises using at least one pair of oligonucleotide primersspecific for at least one classifier biomarker of Table 1.

10. The method of any one of embodiments 1-9, wherein the sample is aformalin-fixed, paraffin-embedded (FFPE) tissue sample, a fresh or afrozen tissue sample, an exosome, wash fluids, cell pellets, or a bodilyfluid obtained from the patient.

11. The method of embodiment 10, wherein the bodily fluid is blood orfractions thereof, urine, saliva, or sputum.

12. The method of any one embodiments 1-11, wherein the at least oneclassifier biomarker comprises a plurality of classifier biomarkers.

13. The method of embodiment 12, wherein the plurality of classifierbiomarkers comprises, consists essentially of or consists of at least 2classifier biomarkers, at least 4 classifier biomarkers, at least 6classifier biomarkers, at least 8 classifier biomarkers, at least 10classifier biomarkers, at least 12 classifier biomarkers, at least 14classifier biomarkers, at least 16 classifier biomarkers, at least 18classifier biomarkers, at least 20 classifier biomarkers, at least 30classifier biomarkers, at least 40 classifier biomarkers, at least 50classifier biomarkers, at least 60 classifier biomarkers, at least 70classifier biomarkers or at least 80 classifier biomarkers of Table 1.

14. The method of any one of embodiments 1-13, wherein the at least oneclassifier biomarker comprises, consists essentially of or consists ofall the classifier biomarkers of Table 1.

15. A method of detecting a biomarker in a tumor sample obtained from apatient, the method comprising measuring the expression level of aplurality of classifier biomarker nucleic acids selected from Table 1using an amplification, hybridization and/or sequencing assay.

16. The method of embodiment 15, wherein the patient is suffering fromor is suspected of suffering from kidney renal papillary cell carcinoma(KIRP); breast invasive carcinoma (BRCA); thyroid cancer (THCA); bladderurothelial carcinoma (BLCA); prostate adenocarcinoma (PRAD); kidneychromophobe (KICH); cervical squamous cell carcinoma and endocervicaladenocarcinoma (CESC); kidney renal clear cell carcinoma (KIRC); liverhepatocellular carcinoma (LIHC); low grade glioma (LGG); sarcoma (SARC);lung adenocarcinoma (LUAD); colon adenocarcinoma (COAD); head and necksquamous cell carcinoma (HNSC); uterine corpus endometrial carcinoma(UCEC); glioblastoma multiforme (GBM); esophageal carcinoma (ESCA);stomach adenocarcinoma (STAD); ovarian serous cystadenocarcinoma (OV);rectum adenocarcinoma (READ); adrenocortical carcinoma (ACC); uvealmelanoma (UVM); mesothelioma (MESO); pheochromocytoma and paraganglioma(PCPG); skin cutaneous melanoma (SKCM); uterine carcinsarcoma (UCS);lung squamous cell carcinoma (LUSC); testicular germ cell tumors (TGCT);cholangiocarcinoma (CHOL); pancreatic adenocarcinoma (PAAD); thymoma(THYM); or Lymphoid Neoplasm Diffuse Large B-cell Lymphoma (DLBC).

17. The method of embodiment 15 or 16, wherein the amplification,hybridization and/or sequencing assay comprises performing quantitativereal time reverse transcriptase polymerase chain reaction (qRT-PCR),RNAseq, microarrays, gene chips, nCounter Gene Expression Assay, SerialAnalysis of Gene Expression (SAGE), Rapid Analysis of Gene Expression(RAGE), nuclease protection assays, Northern blotting, or any otherequivalent gene expression detection techniques.

18. The method of embodiment 17, wherein the expression level isdetected by performing RNAseq.

19. The method of embodiment 18, wherein the detection of the expressionlevel comprises using at least one pair of oligonucleotide primers pereach of the plurality of biomarker nucleic acids selected from Table 1.

20. The method of any one of embodiments 15-19, wherein the sample is aformalin-fixed, paraffin-embedded (FFPE) tissue sample, fresh or afrozen tissue sample, an exosome, wash fluids, cell pellets, or a bodilyfluid obtained from the patient.

21. The method of embodiment 20, wherein the bodily fluid is blood orfractions thereof, urine, saliva, or sputum.

22. The method of any one of embodiments 15-21, wherein the plurality ofclassifier biomarkers comprises, consists essentially of or consists ofat least 2 classifier biomarkers, at least 5 classifier biomarkers, atleast 10 classifier biomarkers, at least 20 classifier biomarkers, atleast 30 classifier biomarkers, at least 40 classifier biomarkers, atleast 50 classifier biomarkers, at least 60 classifier biomarkers, atleast 70 classifier biomarkers or at least 80 classifier biomarkers ofTable 1.

23. The method of any one of embodiments 15-22, wherein the plurality ofbiomarker nucleic acids comprises, consists essentially of or consistsof all the classifier biomarker nucleic acids of Table 1.

24. A method of treating cancer in a subject, the method comprising:

measuring the expression level of at least one biomarker nucleic acid ina tumor sample obtained from the subject, wherein the at least onebiomarker nucleic acid is selected from a set of biomarkers listed inTable 1, wherein the presence, absence and/or level of the at least onebiomarker indicates a COCA subtype of the cancer; and administering atherapeutic agent based on the COCA subtype of the cancer.

25. The method of embodiment 24, wherein the at least one biomarkernucleic acid selected from the set of biomarkers comprises, consistsessentially of or consists of at least 2 classifier biomarkers, at least5 classifier biomarkers, at least 10 classifier biomarkers, at least 20classifier biomarkers, at least 30 classifier biomarkers, at least 40classifier biomarkers, at least 50 classifier biomarkers, at least 60classifier biomarkers, at least 70 classifier biomarkers or at least 80classifier biomarkers of Table 1.

26. The method of embodiment 24 or 25, further comprising measuring theexpression of at least one biomarker from an additional set ofbiomarkers.

27. The method of embodiment 26, wherein the additional set ofbiomarkers comprises at least an immune cell signature, a cellproliferation signature, or drug target genes.

28. The method of any one of embodiments 24-27, wherein the measuringthe expression level is conducted using an amplification, hybridizationand/or sequencing assay.

29. The method of embodiment 28, wherein the amplification,hybridization and/or sequencing assay comprises performing quantitativereal time reverse transcriptase polymerase chain reaction (qRT-PCR),RNAseq, microarrays, gene chips, nCounter Gene Expression Assay, SerialAnalysis of Gene Expression (SAGE), Rapid Analysis of Gene Expression(RAGE), nuclease protection assays, Northern blotting, or any otherequivalent gene expression detection techniques.

30. The method of embodiment 29, wherein the expression level isdetected by performing RNAseq.

31. The method of any one of embodiments 24-30, wherein the sample is aformalin-fixed, paraffin-embedded (FFPE) tissue sample, fresh or afrozen tissue sample, an exosome, wash fluids, cell pellets, or a bodilyfluid obtained from the patient.

32. The method of embodiment 31, wherein the bodily fluid is blood orfractions thereof, urine, saliva, or sputum.

33. The method of any one of embodiments 24-32, wherein the subject'sCOCA subtype is selected from C1, C2, C3, C4, C6, C8, C9, C10, C12, C14,C15, C16, C17, C19, C20, C21, C22, C24, C25, C26 or C28.

34. The method of embodiment 33, wherein the C1 COCA subtype indicatesthat a tumor sample is substantially similar to or is adenocorticalcarcinoma; the C2 COCA subtype indicates that a tumor sample issubstantially similar to or is glioblastoma; the C3 COCA subtypeindicates that a tumor sample is substantially similar to or is anovarian serous cystadenocarcinoma (epithelial ovarian cancer); the C4COCA subtype indicates that a tumor sample is substantially similar toor is squamous cell carcinoma of the lung, the head and neck or thebladder; the C6 COCA subtype indicates that a tumor sample issubstantially similar to or is lung adenocarcinoma; the C8 COCA subtypeindicates that a tumor sample is substantially similar to or ispancreatic adenocarcinoma; the C9 COCA subtype indicates that a tumorsample is substantially similar to or is uterine carcinosarcoma; the C10COCA subtype indicates that a tumor sample is substantially similar toor is the basal subtype of breast cancer; the C12 COCA subtype indicatesthat a tumor sample is substantially similar to or is uterine corpusendometrial cancer; the C14 COCA subtype indicates that a tumor sampleis substantially similar to or is prostate cancer; the C15 COCA subtypecan indicate that a tumor sample is substantially similar to or isnon-squamous cervical cancer; the C16 COCA subtype indicates that atumor sample is substantially similar to or is a bladder urothelialcarcinoma; the C17 COCA subtype indicates that a tumor sample issubstantially similar to or is a testicular germ cell tumor; the C19COCA subtype indicates that a tumor sample is substantially similar toor is a colon, rectal, esophageal or stomach adenocarcinoma; the C20COCA subtype indicates that a tumor sample is substantially similar toor is a sarcoma; the C21 COCA subtype indicates that a tumor sample issubstantially similar to or is a kidney chromophobe, kidney renalpapillary cell carcinoma or kidney renal clear cell carcinoma; the C22COCA subtype indicates that a tumor sample is substantially similar toor is liver hepatocellular carcinoma; the C24 COCA subtype indicatesthat a tumor sample is substantially similar to or is the luminalsubtype of breast cancer; the C25 COCA subtype indicates that a tumorsample is substantially similar to or is thymoma; the C26 COCA subtypeindicates that a tumor sample is substantially similar to or ismelanoma; or the C28 COCA subtype indicates that a tumor sample issubstantially similar to or is thyroid cancer.

35. A method of predicting overall survival in a cancer patient, themethod comprising detecting an expression level of at least oneclassifier biomarker of Table 1 in a tumor sample obtained from apatient, wherein the detection of the expression level of the at leastone classifier biomarker specifically identifies a COCA subtype, andwherein identification of the COCA subtype is predictive of the overallsurvival in the patient.

36. The method of embodiment 35, wherein the method further comprisescomparing the detected levels of expression of the at least oneclassifier biomarker of Table 1 to the expression of the at least oneclassifier biomarker of Table 1 in at least one sample training set(s),wherein the at least one sample training set(s) comprises expressiondata of the at least one classifier biomarker of Table 1 from areference C1 sample, expression data of the at least one classifierbiomarker of Table 1 from a reference C2 sample, expression data of theat least one classifier biomarker of Table 1 from a reference C3 sample,expression data of the at least one classifier biomarker of Table 1 froma reference C4 sample, expression data of the at least one classifierbiomarker of Table 1 from a reference C6 sample, expression data of theat least one classifier biomarker of Table 1 from a reference C8 sample,expression data of the at least one classifier biomarker of Table 1 froma reference C9 sample, expression data of the at least one classifierbiomarker of Table 1 from a reference C10 sample, expression data of theat least one classifier biomarker of Table 1 from a reference C12sample, expression data of the at least one classifier biomarker ofTable 1 from a reference C14 sample, expression data of the at least oneclassifier biomarker of Table 1 from a reference C15 sample, expressiondata of the at least one classifier biomarker of Table 1 from areference C16 sample, expression data of the at least one classifierbiomarker of Table 1 from a reference C17 sample, expression data of theat least one classifier biomarker of Table 1 from a reference C19sample, expression data of the at least one classifier biomarker ofTable 1 from a reference C20 sample, expression data of the at least oneclassifier biomarker of Table 1 from a reference C21 sample, expressiondata of the at least one classifier biomarker of Table 1 from areference C22 sample, expression data of the at least one classifierbiomarker of Table 1 from a reference C24 sample, expression data of theat least one classifier biomarker of Table 1 from a reference C25sample, expression data of the at least one classifier biomarker ofTable 1 from a reference C26 sample, expression data of the at least oneclassifier biomarker of Table 1 from a reference C28 sample or acombination thereof; and classifying the sample as the C1, C2, C3, C4,C6, C8, C9, C10, C12, C14, C15, C16, C17, C19, C20, C21, C22, C24, C25,C26 or C28 COCA subtype based on the results of the comparing step.

37. The method of embodiment 36, wherein the comparing step comprisesapplying a statistical algorithm which comprises determining acorrelation between the expression data obtained from the sample and theexpression data from the at least one training set(s); and classifyingthe sample as a C1, C2, C3, C4, C6, C8, C9, C10, C12, C14, C15, C16,C17, C19, C20, C21, C22, C24, C25, C26 or C28 COCA subtype based on theresults of the statistical algorithm.

38. The method of any one of the embodiments 35-37, wherein theexpression level of the classifier biomarker is detected at the nucleicacid level.

39. The method of embodiment 38, wherein the nucleic acid level is RNAor cDNA.

40. The method of any one of embodiments 35-39, wherein the detecting anexpression level comprises performing quantitative real time reversetranscriptase polymerase chain reaction (qRT-PCR), RNAseq, microarrays,gene chips, nCounter Gene Expression Assay, Serial Analysis of GeneExpression (SAGE), Rapid Analysis of Gene Expression (RAGE), nucleaseprotection assays, Northern blotting, or any other equivalent geneexpression detection techniques.

41. The method of embodiment 40, wherein the expression level isdetected by performing RNAseq.

42. The method of embodiment 35, wherein the detection of the expressionlevel comprises using at least one pair of oligonucleotide primersspecific for at least one classifier biomarker of Table 1.

43. The method of any one of embodiments 35-42, wherein the sample is aformalin-fixed, paraffin-embedded (FFPE) tissue sample, fresh or afrozen tissue sample, an exosome, wash fluids, cell pellets, or a bodilyfluid obtained from the patient.

44. The method of embodiment 43, wherein the bodily fluid is blood orfractions thereof, urine, saliva, or sputum.

45. The method of any one of embodiments 35-44, wherein the at least oneclassifier biomarker comprises a plurality of classifier biomarkers.

46. The method of embodiment 45, wherein the plurality of classifierbiomarkers comprises, consists essentially of or consists of at least 2classifier biomarkers, at least 5 classifier biomarkers, at least 10classifier biomarkers, at least 20 classifier biomarkers, at least 30classifier biomarkers, at least 40 classifier biomarkers, at least 50classifier biomarkers, at least 60 classifier biomarkers, at least 70classifier biomarkers or at least 80 classifier biomarkers of Table 1.

47. The method of any one of embodiments 35-46, wherein the at least oneclassifier biomarker comprises, consists essentially of or consists ofall the classifier biomarkers of Table 1.

The various embodiments described above can be combined to providefurther embodiments. All of the U.S. patents, U.S. patent applicationpublications, U.S. patent application, foreign patents, foreign patentapplication and non-patent publications referred to in thisspecification and/or listed in the Application Data Sheet areincorporated herein by reference, in their entirety. Aspects of theembodiments can be modified, if necessary to employ concepts of thevarious patents, application and publications to provide yet furtherembodiments.

These and other changes can be made to the embodiments in light of theabove-detailed description. In general, in the following claims, theterms used should not be construed to limit the claims to the specificembodiments disclosed in the specification and the claims, but should beconstrued to include all possible embodiments along with the full scopeof equivalents to which such claims are entitled. Accordingly, theclaims are not limited by the disclosure.

What is claimed is:
 1. A method for determining a clustering of clusterassignments (COCA) subtype of a tumor cancer sample obtained from apatient, the method comprising detecting an expression level of at leastone classifier biomarker of Table 1, wherein the detection of theexpression level of the classifier biomarker specifically identifies aC1, C2, C3, C4, C6, C8, C9, C10, C12, C14, C15, C16, C17, C19, C20, C21,C22, C24, C25, C26 or C28 COCA subtype.
 2. The method of claim 1,wherein the method further comprises comparing the detected levels ofexpression of the at least one classifier biomarker of Table 1 to theexpression of the at least one classifier biomarker of Table 1 in atleast one sample training set(s), wherein the at least one sampletraining set(s) comprises expression data of the at least one classifierbiomarker of Table 1 from a reference C1 sample, expression data of theat least one classifier biomarker of Table 1 from a reference C2 sample,expression data of the at least one classifier biomarker of Table 1 froma reference C3 sample, expression data of the at least one classifierbiomarker of Table 1 from a reference C4 sample, expression data of theat least one classifier biomarker of Table 1 from a reference C6 sample,expression data of the at least one classifier biomarker of Table 1 froma reference C8 sample, expression data of the at least one classifierbiomarker of Table 1 from a reference C9 sample, expression data of theat least one classifier biomarker of Table 1 from a reference C10sample, expression data of the at least one classifier biomarker ofTable 1 from a reference C12 sample, expression data of the at least oneclassifier biomarker of Table 1 from a reference C14 sample, expressiondata of the at least one classifier biomarker of Table 1 from areference C15 sample, expression data of the at least one classifierbiomarker of Table 1 from a reference C16 sample, expression data of theat least one classifier biomarker of Table 1 from a reference C17sample, expression data of the at least one classifier biomarker ofTable 1 from a reference C19 sample, expression data of the at least oneclassifier biomarker of Table 1 from a reference C20 sample, expressiondata of the at least one classifier biomarker of Table 1 from areference C21 sample, expression data of the at least one classifierbiomarker of Table 1 from a reference C22 sample, expression data of theat least one classifier biomarker of Table 1 from a reference C24sample, expression data of the at least one classifier biomarker ofTable 1 from a reference C25 sample, expression data of the at least oneclassifier biomarker of Table 1 from a reference C26 sample, expressiondata of the at least one classifier biomarker of Table 1 from areference C28 sample or a combination thereof; and classifying thesample as the C1, C2, C3, C4, C6, C8, C9, C10, C12, C14, C15, C16, C17,C19, C20, C21, C22, C24, C25, C26 or C28 COCA subtype based on theresults of the comparing step.
 3. The method of claim 2, wherein thecomparing step comprises applying a statistical algorithm whichcomprises determining a correlation between the expression data obtainedfrom the sample and the expression data from the at least one trainingset(s); and classifying the sample as a C1, C2, C3, C4, C6, C8, C9, C10,C12, C14, C15, C16, C17, C19, C20, C21, C22, C24, C25, C26 or C28 COCAsubtype based on the results of the statistical algorithm.
 4. The methodof claim 1, wherein the C1 COCA subtype indicates that a tumor sample issubstantially similar to or is adenocortical carcinoma; the C2 COCAsubtype indicates that a tumor sample is substantially similar to or isglioblastoma; the C3 COCA subtype indicates that a tumor sample issubstantially similar to or is an ovarian serous cystadenocarcinoma(epithelial ovarian cancer); the C4 COCA subtype indicates that a tumorsample is substantially similar to or is squamous cell carcinoma of thelung, the head and neck or the bladder; the C6 COCA subtype indicatesthat a tumor sample is substantially similar to or is lungadenocarcinoma; the C8 COCA subtype indicates that a tumor sample issubstantially similar to or is pancreatic adenocarcinoma; the C9 COCAsubtype indicates that a tumor sample is substantially similar to or isuterine carcinosarcoma; the C10 COCA subtype indicates that a tumorsample is substantially similar to or is the basal subtype of breastcancer; the C12 COCA subtype indicates that a tumor sample issubstantially similar to or is uterine corpus endometrial cancer; theC14 COCA subtype indicates that a tumor sample is substantially similarto or is prostate cancer; the C15 COCA subtype can indicate that a tumorsample is substantially similar to or is non-squamous cervical cancer;the C16 COCA subtype indicates that a tumor sample is substantiallysimilar to or is a bladder urothelial carcinoma; the C17 COCA subtypeindicates that a tumor sample is substantially similar to or is atesticular germ cell tumor; the C19 COCA subtype indicates that a tumorsample is substantially similar to or is a colon, rectal, esophageal orstomach adenocarcinoma; the C20 COCA subtype indicates that a tumorsample is substantially similar to or is a sarcoma; the C21 COCA subtypeindicates that a tumor sample is substantially similar to or is a kidneychromophobe, kidney renal papillary cell carcinoma or kidney renal clearcell carcinoma; the C22 COCA subtype indicates that a tumor sample issubstantially similar to or is liver hepatocellular carcinoma; the C24COCA subtype indicates that a tumor sample is substantially similar toor is the luminal subtype of breast cancer; the C25 COCA subtypeindicates that a tumor sample is substantially similar to or is thymoma;the C26 COCA subtype indicates that a tumor sample is substantiallysimilar to or is melanoma; or the C28 COCA subtype indicates that atumor sample is substantially similar to or is thyroid cancer.
 5. Themethod of claim 1, wherein the expression level of the classifierbiomarker is detected at the nucleic acid level.
 6. The method of claim5, wherein the nucleic acid level is RNA or cDNA.
 7. The method claim 5or 6, wherein the detecting an expression level comprises performingquantitative real time reverse transcriptase polymerase chain reaction(qRT-PCR), RNAseq, microarrays, gene chips, nCounter Gene ExpressionAssay, Serial Analysis of Gene Expression (SAGE), Rapid Analysis of GeneExpression (RAGE), nuclease protection assays, Northern blotting, or anyother equivalent gene expression detection techniques.
 8. The method ofclaim 7, wherein the expression level is detected by performing RNAseq.9. The method of claim 8, wherein the detection of the expression levelcomprises using at least one pair of oligonucleotide primers specificfor at least one classifier biomarker of Table
 1. 10. The method ofclaim 1, wherein the sample is a formalin-fixed, paraffin-embedded(FFPE) tissue sample, a fresh or a frozen tissue sample, an exosome,wash fluids, cell pellets, or a bodily fluid obtained from the patient.11. The method of claim 10, wherein the bodily fluid is blood orfractions thereof, urine, saliva, or sputum.
 12. The method of claim 1,wherein the at least one classifier biomarker comprises a plurality ofclassifier biomarkers.
 13. The method of claim 12, wherein the pluralityof classifier biomarkers comprises, consists essentially of or consistsof at least 2 classifier biomarkers, at least 4 classifier biomarkers,at least 6 classifier biomarkers, at least 8 classifier biomarkers, atleast 10 classifier biomarkers, at least 12 classifier biomarkers, atleast 14 classifier biomarkers, at least 16 classifier biomarkers, atleast 18 classifier biomarkers, at least 20 classifier biomarkers, atleast 30 classifier biomarkers, at least 40 classifier biomarkers, atleast 50 classifier biomarkers, at least 60 classifier biomarkers, atleast 70 classifier biomarkers or at least 80 classifier biomarkers ofTable
 1. 14. The method of claim 1, wherein the at least one classifierbiomarker comprises, consists essentially of or consists of all theclassifier biomarkers of Table
 1. 15. A method of detecting a biomarkerin a tumor sample obtained from a patient, the method comprisingmeasuring the expression level of a plurality of classifier biomarkernucleic acids selected from Table 1 using an amplification,hybridization and/or sequencing assay.
 16. The method of claim 15,wherein the patient is suffering from or is suspected of suffering fromkidney renal papillary cell carcinoma (KIRP); breast invasive carcinoma(BRCA); thyroid cancer (THCA); bladder urothelial carcinoma (BLCA);prostate adenocarcinoma (PRAD); kidney chromophobe (KICH); cervicalsquamous cell carcinoma and endocervical adenocarcinoma (CESC); kidneyrenal clear cell carcinoma (KIRC); liver hepatocellular carcinoma(LIHC); low grade glioma (LGG); sarcoma (SARC); lung adenocarcinoma(LUAD); colon adenocarcinoma (COAD); head and neck squamous cellcarcinoma (HNSC); uterine corpus endometrial carcinoma (UCEC);glioblastoma multiforme (GBM); esophageal carcinoma (ESCA); stomachadenocarcinoma (STAD); ovarian serous cystadenocarcinoma (OV); rectumadenocarcinoma (READ); adrenocortical carcinoma (ACC); uveal melanoma(UVM); mesothelioma (MESO); pheochromocytoma and paraganglioma (PCPG);skin cutaneous melanoma (SKCM); uterine carcinsarcoma (UCS); lungsquamous cell carcinoma (LUSC); testicular germ cell tumors (TGCT);cholangiocarcinoma (CHOL); pancreatic adenocarcinoma (PAAD); thymoma(THYM); or Lymphoid Neoplasm Diffuse Large B-cell Lymphoma (DLBC). 17.The method of claim 15 or 16, wherein the amplification, hybridizationand/or sequencing assay comprises performing quantitative real timereverse transcriptase polymerase chain reaction (qRT-PCR), RNAseq,microarrays, gene chips, nCounter Gene Expression Assay, Serial Analysisof Gene Expression (SAGE), Rapid Analysis of Gene Expression (RAGE),nuclease protection assays, Northern blotting, or any other equivalentgene expression detection techniques.
 18. The method of claim 17,wherein the expression level is detected by performing RNAseq.
 19. Themethod of claim 18, wherein the detection of the expression levelcomprises using at least one pair of oligonucleotide primers per each ofthe plurality of biomarker nucleic acids selected from Table
 1. 20. Themethod of claim 15, wherein the sample is a formalin-fixed,paraffin-embedded (FFPE) tissue sample, fresh or a frozen tissue sample,an exosome, wash fluids, cell pellets, or a bodily fluid obtained fromthe patient.
 21. The method of claim 20, wherein the bodily fluid isblood or fractions thereof, urine, saliva, or sputum.
 22. The method ofclaim 15, wherein the plurality of classifier biomarkers comprises,consists essentially of or consists of at least 2 classifier biomarkers,at least 5 classifier biomarkers, at least 10 classifier biomarkers, atleast 20 classifier biomarkers, at least 30 classifier biomarkers, atleast 40 classifier biomarkers, at least 50 classifier biomarkers, atleast 60 classifier biomarkers, at least 70 classifier biomarkers or atleast 80 classifier biomarkers of Table
 1. 23. The method of claim 15,wherein the plurality of biomarker nucleic acids comprises, consistsessentially of or consists of all the classifier biomarker nucleic acidsof Table
 1. 24. A method of treating cancer in a subject, the methodcomprising: measuring the expression level of at least one biomarkernucleic acid in a tumor sample obtained from the subject, wherein the atleast one biomarker nucleic acid is selected from a set of biomarkerslisted in Table 1, wherein the presence, absence and/or level of the atleast one biomarker indicates a COCA subtype of the cancer; andadministering a therapeutic agent based on the COCA subtype of thecancer.
 25. The method of claim 24, wherein the at least one biomarkernucleic acid selected from the set of biomarkers comprises, consistsessentially of or consists of at least 2 classifier biomarkers, at least5 classifier biomarkers, at least 10 classifier biomarkers, at least 20classifier biomarkers, at least 30 classifier biomarkers, at least 40classifier biomarkers, at least 50 classifier biomarkers, at least 60classifier biomarkers, at least 70 classifier biomarkers or at least 80classifier biomarkers of Table
 1. 26. The method of claim 24 or 25,further comprising measuring the expression of at least one biomarkerfrom an additional set of biomarkers.
 27. The method of claim 26,wherein the additional set of biomarkers comprises at least an immunecell signature, a cell proliferation signature, or drug target genes.28. The method of claim 24, wherein the measuring the expression levelis conducted using an amplification, hybridization and/or sequencingassay.
 29. The method of claim 28, wherein the amplification,hybridization and/or sequencing assay comprises performing quantitativereal time reverse transcriptase polymerase chain reaction (qRT-PCR),RNAseq, microarrays, gene chips, nCounter Gene Expression Assay, SerialAnalysis of Gene Expression (SAGE), Rapid Analysis of Gene Expression(RAGE), nuclease protection assays, Northern blotting, or any otherequivalent gene expression detection techniques.
 30. The method of claim29, wherein the expression level is detected by performing RNAseq. 31.The method of claim 24, wherein the sample is a formalin-fixed,paraffin-embedded (FFPE) tissue sample, fresh or a frozen tissue sample,an exosome, wash fluids, cell pellets, or a bodily fluid obtained fromthe patient.
 32. The method of claim 31, wherein the bodily fluid isblood or fractions thereof, urine, saliva, or sputum.
 33. The method ofclaim 24, wherein the subject's COCA subtype is selected from C1, C2,C3, C4, C6, C8, C9, C10, C12, C14, C15, C16, C17, C19, C20, C21, C22,C24, C25, C26 or C28.
 34. The method of claim 33, wherein the C1 COCAsubtype indicates that a tumor sample is substantially similar to or isadenocortical carcinoma; the C2 COCA subtype indicates that a tumorsample is substantially similar to or is glioblastoma; the C3 COCAsubtype indicates that a tumor sample is substantially similar to or isan ovarian serous cystadenocarcinoma (epithelial ovarian cancer); the C4COCA subtype indicates that a tumor sample is substantially similar toor is squamous cell carcinoma of the lung, the head and neck or thebladder; the C6 COCA subtype indicates that a tumor sample issubstantially similar to or is lung adenocarcinoma; the C8 COCA subtypeindicates that a tumor sample is substantially similar to or ispancreatic adenocarcinoma; the C9 COCA subtype indicates that a tumorsample is substantially similar to or is uterine carcinosarcoma; the C10COCA subtype indicates that a tumor sample is substantially similar toor is the basal subtype of breast cancer; the C12 COCA subtype indicatesthat a tumor sample is substantially similar to or is uterine corpusendometrial cancer; the C14 COCA subtype indicates that a tumor sampleis substantially similar to or is prostate cancer; the C15 COCA subtypecan indicate that a tumor sample is substantially similar to or isnon-squamous cervical cancer; the C16 COCA subtype indicates that atumor sample is substantially similar to or is a bladder urothelialcarcinoma; the C17 COCA subtype indicates that a tumor sample issubstantially similar to or is a testicular germ cell tumor; the C19COCA subtype indicates that a tumor sample is substantially similar toor is a colon, rectal, esophageal or stomach adenocarcinoma; the C20COCA subtype indicates that a tumor sample is substantially similar toor is a sarcoma; the C21 COCA subtype indicates that a tumor sample issubstantially similar to or is a kidney chromophobe, kidney renalpapillary cell carcinoma or kidney renal clear cell carcinoma; the C22COCA subtype indicates that a tumor sample is substantially similar toor is liver hepatocellular carcinoma; the C24 COCA subtype indicatesthat a tumor sample is substantially similar to or is the luminalsubtype of breast cancer; the C25 COCA subtype indicates that a tumorsample is substantially similar to or is thymoma; the C26 COCA subtypeindicates that a tumor sample is substantially similar to or ismelanoma; or the C28 COCA subtype indicates that a tumor sample issubstantially similar to or is thyroid cancer.
 35. A method ofpredicting overall survival in a cancer patient, the method comprisingdetecting an expression level of at least one classifier biomarker ofTable 1 in a tumor sample obtained from a patient, wherein the detectionof the expression level of the at least one classifier biomarkerspecifically identifies a COCA subtype, and wherein identification ofthe COCA subtype is predictive of the overall survival in the patient.36. The method of claim 35, wherein the method further comprisescomparing the detected levels of expression of the at least oneclassifier biomarker of Table 1 to the expression of the at least oneclassifier biomarker of Table 1 in at least one sample training set(s),wherein the at least one sample training set(s) comprises expressiondata of the at least one classifier biomarker of Table 1 from areference C1 sample, expression data of the at least one classifierbiomarker of Table 1 from a reference C2 sample, expression data of theat least one classifier biomarker of Table 1 from a reference C3 sample,expression data of the at least one classifier biomarker of Table 1 froma reference C4 sample, expression data of the at least one classifierbiomarker of Table 1 from a reference C6 sample, expression data of theat least one classifier biomarker of Table 1 from a reference C8 sample,expression data of the at least one classifier biomarker of Table 1 froma reference C9 sample, expression data of the at least one classifierbiomarker of Table 1 from a reference C10 sample, expression data of theat least one classifier biomarker of Table 1 from a reference C12sample, expression data of the at least one classifier biomarker ofTable 1 from a reference C14 sample, expression data of the at least oneclassifier biomarker of Table 1 from a reference C15 sample, expressiondata of the at least one classifier biomarker of Table 1 from areference C16 sample, expression data of the at least one classifierbiomarker of Table 1 from a reference C17 sample, expression data of theat least one classifier biomarker of Table 1 from a reference C19sample, expression data of the at least one classifier biomarker ofTable 1 from a reference C20 sample, expression data of the at least oneclassifier biomarker of Table 1 from a reference C21 sample, expressiondata of the at least one classifier biomarker of Table 1 from areference C22 sample, expression data of the at least one classifierbiomarker of Table 1 from a reference C24 sample, expression data of theat least one classifier biomarker of Table 1 from a reference C25sample, expression data of the at least one classifier biomarker ofTable 1 from a reference C26 sample, expression data of the at least oneclassifier biomarker of Table 1 from a reference C28 sample or acombination thereof; and classifying the sample as the C1, C2, C3, C4,C6, C8, C9, C10, C12, C14, C15, C16, C17, C19, C20, C21, C22, C24, C25,C26 or C28 COCA subtype based on the results of the comparing step. 37.The method of claim 36, wherein the comparing step comprises applying astatistical algorithm which comprises determining a correlation betweenthe expression data obtained from the sample and the expression datafrom the at least one training set(s); and classifying the sample as aC1, C2, C3, C4, C6, C8, C9, C10, C12, C14, C15, C16, C17, C19, C20, C21,C22, C24, C25, C26 or C28 COCA subtype based on the results of thestatistical algorithm.
 38. The method of any one of the claims 35-37,wherein the expression level of the classifier biomarker is detected atthe nucleic acid level.
 39. The method of claim 38, wherein the nucleicacid level is RNA or cDNA.
 40. The method of claim 35, wherein thedetecting an expression level comprises performing quantitative realtime reverse transcriptase polymerase chain reaction (qRT-PCR), RNAseq,microarrays, gene chips, nCounter Gene Expression Assay, Serial Analysisof Gene Expression (SAGE), Rapid Analysis of Gene Expression (RAGE),nuclease protection assays, Northern blotting, or any other equivalentgene expression detection techniques.
 41. The method of claim 40,wherein the expression level is detected by performing RNAseq.
 42. Themethod of claim 35, wherein the detection of the expression levelcomprises using at least one pair of oligonucleotide primers specificfor at least one classifier biomarker of Table
 1. 43. The method ofclaim 35, wherein the sample is a formalin-fixed, paraffin-embedded(FFPE) tissue sample, fresh or a frozen tissue sample, an exosome, washfluids, cell pellets, or a bodily fluid obtained from the patient. 44.The method of claim 43, wherein the bodily fluid is blood or fractionsthereof, urine, saliva, or sputum.
 45. The method of claim 35, whereinthe at least one classifier biomarker comprises a plurality ofclassifier biomarkers.
 46. The method of claim 45, wherein the pluralityof classifier biomarkers comprises, consists essentially of or consistsof at least 2 classifier biomarkers, at least 5 classifier biomarkers,at least 10 classifier biomarkers, at least 20 classifier biomarkers, atleast 30 classifier biomarkers, at least 40 classifier biomarkers, atleast 50 classifier biomarkers, at least 60 classifier biomarkers, atleast 70 classifier biomarkers or at least 80 classifier biomarkers ofTable
 1. 47. The method of claim 35, wherein the at least one classifierbiomarker comprises, consists essentially of or consists of all theclassifier biomarkers of Table 1.